Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 May 18;15(5):e0232849. doi: 10.1371/journal.pone.0232849

Sequence-structure-function relationships in class I MHC: A local frustration perspective

Onur Serçinoğlu 1, Pemra Ozbek 2,*
Editor: Antony Nicodemus Antoniou3
PMCID: PMC7233585  PMID: 32421728

Abstract

Class I Major Histocompatibility Complex (MHC) binds short antigenic peptides with the help of Peptide Loading Complex (PLC), and presents them to T-cell Receptors (TCRs) of cytotoxic T-cells and Killer-cell Immunglobulin-like Receptors (KIRs) of Natural Killer (NK) cells. With more than 10000 alleles, human MHC (Human Leukocyte Antigen, HLA) is the most polymorphic protein in humans. This allelic diversity provides a wide coverage of peptide sequence space, yet does not affect the three-dimensional structure of the complex. Moreover, TCRs mostly interact with HLA in a common diagonal binding mode, and KIR-HLA interaction is allele-dependent. With the aim of establishing a framework for understanding the relationships between polymorphism (sequence), structure (conserved fold) and function (protein interactions) of the human MHC, we performed here a local frustration analysis on pMHC homology models covering 1436 HLA I alleles. An analysis of local frustration profiles indicated that (1) variations in MHC fold are unlikely due to minimally-frustrated and relatively conserved residues within the HLA peptide-binding groove, (2) high frustration patches on HLA helices are either involved in or near interaction sites of MHC with the TCR, KIR, or tapasin of the PLC, and (3) peptide ligands mainly stabilize the F-pocket of HLA binding groove.

Introduction

The sequence-structure-function paradigm plays a central role in structural biology: the primary structure (i.e. amino acid sequence) of a protein dictates the three-dimensional structure (fold), which in turn influences the function [13]. The close relationship between the structural architecture and function of a protein implies that disruptions to the native folded state can destabilize the protein structure, and eventually cause loss of function. In line with this perspective, several studies integrating sequence evolution with protein biophysics reported strong correlations between positional conservation levels or rates of synonymous/non-synonymous mutation and stability-related residue metrics such as solvent accessibility or hydrogen bonding patterns [411]. On the other hand, stability is not the main determinant of protein function. Residues that are not involved in forming the protein scaffold, such as the catalytic sites of an enzyme or binding hot spots located on protein surfaces, are important for protein function as well.

Protein evolution is thus driven by an interplay between functional and structural constraints [2,10,12,13]. With these constraints at play, mutations may occur via two alternative mechanisms: positive and negative selection [14]. In the negative (or purifying) selection, mutations leading to detrimental effects on protein stability or function are eliminated during the selection process, leading to increased levels of conservation at sites crucial for function or stability. In the positive selection model, the protein is under constant pressure to acquire “new-functions”, thus change/variation in the amino acid sequence is favored, especially at sites with direct relevance to protein function [15].

From this perspective of molecular evolution and protein function, the human class I Major Histocompatibility Complex (MHC I, also named the Human Leukocyte Antigen, HLA) is an interesting system [1619]. MHC I consists of heavy (H) and light (L) chains, with the L chain being an invariant protein partner called β-2 microglobulin (β2m), respectively (Fig 1). Following assembly within the Endoplasmic Reticulum (ER), the HLA is unstable in the absence of a peptide ligand [20,21]. Intracellularly derived antigenic peptides originating from self or foreign proteins are loaded into the binding groove of HLA with the help of a multiprotein complex called the Peptide Loading Complex (PLC). Here, the peptide contacts six different binding pockets (A,B,C,D,E and F) within the peptide binding groove of HLA [22]. Upon the formation of a stable peptide-loaded MHC (pMHC), the complex is transported to the cell surface, and presents peptide ligands to immune cell receptors such as the T-Cell Receptor (TCR) of cytotoxic CD8+ T-cells, and Killer-cell Immunoglobulin-like Receptors (KIRs) of Natural Killer (NK) cells [2326].

Fig 1. The three-dimensional structure of the pMHC.

Fig 1

The structure of HLA-B*53:01 complexed with peptide TPYDINQML (PDB code: 1A1M) is shown for demonstration purposes

With more than 10000 identified alleles, HLA I is the most polymorphic protein in humans [27]. The HLA I protein is clearly undergoing a process of positive selection, where the acquisition of new variants modulate the function of antigen presentation via modifying the peptide ligand recognition specificities [16,2831]. In other words, functional constraints require HLA to maintain a very high level of allelic diversity. This allelic diversity helps the immune system cover a large space of potential peptide binders, and thereby fight against pathogens effectively [16,19,24,32]. On the other hand, HLA genes are also strongly linked to infectious and autoimmune diseases due to high polymorphism levels [33].

Despite this high sequence variation in HLA and the enormous diversity of peptide ligands, experimentally elucidated pMHC structures of different alleles display an “ultra-conserved” fold with minimal variation [34]. Moreover, most of the TCR-HLAcrystal structures display a conventional docking mode of TCR over the HLA [35,36]. On the other hand, HLA allelic diversity may affect interactions with other proteins as well. Only certain allele groups containing specific epitopes interact with KIR [25,37]. Structural details of the PLC-pMHC interaction were also only recently revealed [34,3841].

HLA polymorphism should be taken into account in order to truly comprehend the mechanisms involved in the formation of stable human pMHC and interactions with TCR and KIR molecules. However, incorporating such extreme levels of polymorphism into wet-lab experiments is unfeasible. Hence, computational biophysics methods are preferred in large-scale modeling studies. In particular, homology modeling was used to classify a high number of HLA alleles into distinct groups based on peptide interaction patterns [42], binding pocket similarities [43] and surface electrostatics [44]. These studies mainly focused on classifying HLA alleles based on their peptide-binding or protein interaction behaviors, and therefore only the peptide-binding groove was modeled. However, characterization of the relationship between HLA polymorphism and complex stability as well as TCR/KIR/PLC interactions requires the modeling of the whole complex and the use of proper methods for identifying residue-level effects on stability and protein interactions. To this end, quantification of local frustration in the structure can be utilized [4549].

The concept of frustration within protein structures is useful to establish links between their biological behavior and structures. The energy landscape theory indicates that a given chain of amino acids fold into a native structure through an ensemble of structures, and the resulting native state is a minimally frustrated heteropolymer [50]. In other words, interacting energies between the building blocks of the protein structure are minimized within the native state, leading to a well-defined three-dimensional shape. Yet, even in the native state, frustration may exist locally at multiple sites within the structure. Computationally, the local frustration analysis is based on calculation of pairwise contacts between amino acids using an appropriate force-field, and comparison of these contacts to possible alternative contacts made by other amino acid pairs at each site in the structure. Briefly, the analysis quantifies the degree of energetic optimization of the contacts made by each residue, and thus how much local frustration is present at a given site. Given the extreme level of polymorphism in HLA, such information may be particularly useful for establishing a link between functional roles of each residue (e.g. whether they are important for folding or binding), and the respective level of polymorphism observed within a sequence-structure-function context. Local frustration analysis has already been applied successfully to perform such sequence-structure-function studies on calmodulin [51,52], repeat proteins [5,53], and TEM beta-lactamases [6].

Here, we provide an analysis of local frustration patterns of 1436 HLA I alleles. We explain how class I MHC retains a conserved fold while maintaining highly versatile ligand specificities using homology-modeled human pMHC structures. Using the frustration data, we also show the existence of local frustration-based energetic footprints of polymorphism, providing a biophysical basis for the previously observed differences between molecular stabilities/cell surface expression levels of different allele groups and TCR/KIR/PLC interactions. Finally, we also provide a local frustration based explanation of how peptides stabilize peptide binding pockets.

Results and discussion

Sequence variation in HLA binding groove and α-3 domain

We began with an analysis of sequence variation in the HLA I peptide binding groove. A total of 8696 HLA I binding groove (α-1 and α-2 domains) sequences (including 2799, 3433 and 2464 alleles from HLA-A, -B, and -C gene loci, respectively) were included in the analysis. We identified amino acid variation at each respective position in the HLA peptide binding groove by constructing sequence logos (Fig 2A). In line with findings of a recent analysis [54], we detected high levels of variation at most binding groove positions. However, some positions were relatively conserved, and dominated by a single amino acid. This dominance is due to the imbalance between the number of occurrences of the respective amino acid and those of the others. Glycine, phenylalanine, proline, tryptophan, leucine, aspartic acid and arginine were found to be the most conserved amino acid types.

Fig 2. Sequence conservation/variation and evolutionary importance of HLA I peptide-binding groove positions.

Fig 2

(A) Sequence logo of the HLA I peptide-binding groove (residues 1–180). Polar, neutral, basic, acidic and hydrophobic amino acids are colored green, purple, blue, red, and black, respectively. (B) real-value Evolutionary Trace (rvET) scores of binding groove positions. Low rvET scores indicate high evolutionary importance, and vice versa.

The sequence logos also display several “hyper-variable” positions as well (positions 9, 24, 45, 67, 97, 116, 138, 152, 156, and 163). This variation is expected, since all of these positions are located within the peptide binding pockets, and were previously shown to define allele-specific peptide ligand repertoires [42,43,55].

We next quantified the evolutionary importance of each position in the binding groove by computing the real-value Evolutionary Trace (rvET) ranks per position [56]. rvET is an absolute rank of a given position in terms of its evolutionary importance. Here, lower rvET ranks/scores indicate higher evolutionary importance and vice versa. The ET analysis here is based on both sequence variation/conservation and the closeness of sequence divergence to the root of the constructed phylogenetic tree at a given sequence position. Thus, while conserved positions in a multiple sequence alignment tend to have low rvET scores, relatively less conserved positions may also obtain low rvET scores as well, provided that the sequence divergence occurs near the root of the tree and variation occurs within small rather than large branches of the tree [5659]. As such, this method is particularly suitable for analysis of variation in HLA sequences from different gene loci which may feature. In general, we observed higher rvET values at positions with the highest sequence variation and vice versa (Fig 2B).

We repeated the above sequence variation and evolutionary trace analysis for the α-3 domain of HLA as well, and generated sequence logos and computed rvET scores. Here, the number of allele sequences used as input was lower (1436) since the α-3 domain sequences for many alleles are unavailable (see section below as well). As expected, the level of polymorphism of the α-3 domain is lower than that of the binding groove, as demonstrated by the dominance of hydrophobic amino acids in many positions and lower rvET scores (Fig 3).

Fig 3. Sequence conservation/variation and evolutionary importance of HLA I α-3 domain positions.

Fig 3

(A) Sequence logo of the HLA I α-3 domain (residues 181–261). Polar, neutral, basic, acidic and hydrophobic amino acids are colored green, purple, blue, red, and black, respectively. (B) real-value Evolutionary Trace (rvET) scores of α-3 domain positions. Low rvET scores indicate high evolutionary importance, and vice versa.

Homology modeling of HLA alleles in the context of pMHC

Next, we investigated how HLA sequence variation is reflected on the complex structure. As reported previously [34,43,44,60], the number of experimentally determined HLA structures with different alleles is significantly lower than the total allele number. Hence, a homology modeling approach is necessary. Moreover, homology models of the full complex, including the binding groove as well as distant β2m and α-3 domains, are needed, since these domains make extensive contacts with each other as well as the rest of the structure, and were previously shown to be essential for peptide binding [61,62]. However, the sequences of the α-3 domain of a majority of HLA alleles have not been identified yet. It is also necessary to model peptides within the binding groove as peptide ligand is an integral part of the HLA structure. Similar to limitations in HLA structures, the number of identified peptide ligands with binding affinity measurements for individual HLA alleles is limited as well, with some alleles having no peptide ligands identified so far [63]. Therefore, it is also necessary to predict peptide ligands using computational approaches. We thus selected 1436 HLA alleles (464, 689 and 283 from HLA-A, -B and -C loci, respectively) with complete HLA sequence (including all three domains α-1, α-2, and α-3) for homology modeling (the complete list is given in S1 Table). The list of homology modelled alleles included 41 out of 42 core alleles representing the functionally significant sequence variation [54]. We predicted up to 10 strong binder peptides for each of the 1436 alleles using netMHCpan 3.0 [64], and obtained homology models in the context of these peptides. Note that it is also possible to perform more complicated HLA-specific peptide docking or modeling [65,66] to generate more reliable binding modes for peptide ligands. Since our main focus was a characterization of the effect of polymorphism, we nevertheless used homology modeling to model the peptide ligands within the complex for practical reasons.

Integrating biophysics into HLA I evolution

After generating the homology models, we analyzed the local frustration within pMHC structures. A Single Residue Frustration Index (SRFI) was obtained as a position-specific local frustration score: amino acids with optimized energetic contacts are minimally frustrated, where those that are the least preferred at their respective positions are highly frustrated. If neither, then the frustration is termed “neutral”. SRFI values thus indicate how ideal (minimally frustrated) the contacts of each position are or how much frustration is present (highly frustrated).

We used the frustratometer2 tool [67] to quantify SRFI at each position in homology models. Since multiple peptides (and hence structures) were modeled for each allele, we calculated median SRFI per allele per position, and used these median SRFI values in further analyses.

In order to get an overview of local frustration against evolutionary importance of each position, we first mapped position-specific median SRFI and rvET values onto the HLA I binding groove (Fig 4). SRFI distributions of several selected positions are also given in Fig 5A. As expected, the top two minimally frustrated positions were cysteines responsible for forming the conserved disulfide bridge between positions 101 and 164. Mutations at these positions abolish HLA expression [27,68]. Minimal frustration was also observed at conserved positions located in the β-sheet floor of the binding groove or in α-1 and α-2 domains contacting the β-sheet floor (5, 7, 8, 25, 27, 34, 36, 101, and 164) (Fig 4A). Our observation of minimal frustration at conserved positions are in line with recent findings of Dib et al. [69], where co-evolving residues were shown to avoid residues important for protein stability. Haliloglu et al. previously analyzed several HLA structures using the Gaussian Network Model (GNM), and identified positions 6, 27, 101, 103, 113, 124 and 164 as possibly important for stability [70]. Here, we observed minimal frustration at either these or their sequence neighbors.

Fig 4. Position-specific Single Residue Frustration Index (SRFI) and rvET scores mapped onto the HLA I binding groove.

Fig 4

(A, B, C) SRFI mapped onto binding groove positions. (D, E, F) log(rvET) mapped onto binding groove positions. Selected positions are also shown on the structure.

Fig 5. Box-plots of position-specific SRFI values for the HLA I peptide-binding groove.

Fig 5

(A) SRFI box-plots of selected minimally and highly frustrated residues within the HLA peptide binding groove (B) SRFI box-plots of positions with neutral frustration. (C) SRFI box-plots of peptide-binding pocket positions. Coloring according to log(rvET) values.

Next, we identified minimally- and highly-frustrated positions within the α-3 domain by mapping position-specific median SRFI as well as rvET values onto the α-3 domain structure and plotting SRFI distributions (Fig 6 and Fig 7). Median SRFI and rvET values can be found in S2 Table as well. Here, we observed the same relationship between local frustration and sequence conservation: several positions on α-3 beta-sheets were found to be minimally-frustrated and conserved, including positions 204, 203, 247, 259, 261, 215, and 257.

Fig 6. Position-specific Single Residue Frustration Index (SRFI) and rvET scores mapped onto the HLA I α-3 domain.

Fig 6

(A) SRFI mapped onto α-3 domain positions. (B) log(rvET) mapped onto α-3 domain positions. Selected positions are also shown on the structure.

Fig 7. Box-plots of position-specific SRFI values for the HLA I α-3 domain.

Fig 7

SRFI box-plots of minimally- and highly-frustrated residues within the HLA α-3 domain. Coloring according to log(rvET) values.

Our observation of minimal frustration at conserved positions within the peptide binding groove and α-3 domain indicates that the interactions made by the residues in these positions of homology models are energetically favorable (thus minimal frustration), and therefore responsible for the overall structural stability. In other words, the importance of these residues for structural stability may explain their high conservation. Fig 8 shows how the residues at these positions are structurally connected to each other within the HLA-B*53:01 structure (pdb code 1A1M, used for demonstration purposes) (Fig 8). Here, most of the residues can be observed to form physical clusters within individual domains. Minimal frustration at binding groove residues contacting the β2m (8, 25, 27, and 98) as well as β2m residues 56, 60 and 62 clearly points to conserved interactions at the HLA-β2m interface that are likely important for structural stability. On the other hand, no interacting pair of minimally frustrated residues were detected at the β2m and α-3 domain interaction interface. Here, the lowest frustration level was that between residues 232 and β2m-26.

Fig 8. Minimally-frustrated and conserved positions within the HLA structure.

Fig 8

Residues are drawn in van der Waals spheres representation. Coloring according to SRFI value. The structure of HLA-B*53:01 (1A1M) was used for demonstration purposes. Selected β2m residues are also shown in spheres to highlight interaction with HLA.

We also observed several positions on α-1 and α-2 helices and on the α-3 domain loops with relatively high frustration levels (58, 61, 68, 80, 84, 141, 144, 145, 146, 155, 222, 223, and 227) (Fig 4, Fig 7). High frustration at a position may indicate that the interactions made by the respective amino acid is energetically destabilizing compared to all other amino acids. These highly frustrated residues may thus be involved in protein or ligand binding. Indeed, most of these positions are located within interfaces of interaction with either TCR or tapasin of the PLC.

The interaction between the TCR and pMHC I is a central event in adaptive immunity [26,71,72]. The TCR recognizes a peptide antigen only when presented by an MHC molecule (MHC restriction) [72,73]. TCR-pMHC structures determined to date indicate a conserved diagonal binding geometry, where the hypervariable CDR3 loop of TCR contacts the peptide, and germline-encoded variable α and β domains (Vα and Vβ) contact HLA I α-1 and α-2 helices, respectively [74]. Although deviations from this conventional docking mode exist, including a reversed polarity docking mode [75,76], TCR signaling in such unconventional modes is limited [77]. The importance of the conventional docking mode for TCR signaling adds support to the germline-encoded model of TCR-pMHC interaction, in which evolutionarily conserved TCR-pMHC contacts were proposed to govern MHC restriction [78]. In this regard, previous analyses of TCR and human pMHC structures highlighted the importance of contacts made by HLA α-1 and α-2 helix residues 69 and 158 [79] or 65, 69 and 155 as a “restriction triad” [80]. We observed high SRFI levels at or near these positions.

Local frustration data can also provide a basis for PLC-MHC interactions. Tapasin is the main component of the PLC which contacts the HLA binding groove. Predicted tapasin-MHC interactions [39,81] and a recently elucidated crystal structure of pMHC with TAPBPR (a tapasin homolog) [8284] indicated that tapasin cradles the MHC molecule via contacts with α-3 domain and α2–1 helix segment. Our observation of high SRFI levels at positions 141, 144, 145, and 146 located on the α2–1 helix segment as well as at 222, 223, and 227 on the α-3 domain is in line with these findings.

Median SRFI values of most binding groove positions were found to indicate neutral frustration (Fig 5B). On the other hand, HLA polymorphism at peptide binding pocket positions apparently caused significant drifts towards minimal or high frustration as well, even though the median SRFI remained within the neutral frustration range (-1 to 1).

Overall, these results suggest that the human MHC evolves to maintain its structural fold, as evidenced by the dominance of conserved core positions showing minimal frustration within the HLA peptide binding groove. Moreover, the presence of relatively higher frustration at or near TCR and tapasin contact positions on α-1 and α-2 helices may also provide a biophysical basis for protein interactions of pMHC.

Clustering of HLA alleles into distinct groups based on frustration data

Next, we performed a hierarchical clustering analysis of allele-specific SRFI profiles. For simplification, we excluded positions with less than 0.5 SRFI variation from the analysis. Clustering results are shown in Fig 9A, and complete lists of alleles in each cluster are given in S1 Table. Strikingly, alleles from different gene loci were clustered into three separate groups based on frustration data of only 14 positions (45, 66, 67, 69, 74, 76, 79, 80, 82, 116, 131, 152, 156, 163). An exception was the HLA-B*46 group, which was clustered along with the HLA-C alleles (S1 Table. This is not surprising, as alleles in this group share the KYRV motif at 66, 67, 69, and 76 with HLA-C [85].

Fig 9. Clustering of allele-specific SRFI profiles based on 14 binding groove positions.

Fig 9

(A) SRFI cluster heatmap. Clustering was performed for all 1436 alleles included, yet not all of these alleles are indicated on axis label for clarity. (B) SRFI of three identified clusters corresponding to three HLA gene loci mapped onto the binding groove. Coloring as in Fig 5. (C) Sequence logos of clustering positions to highlight amino acid differences between allele in different clusters. The logos were generated using Two Sample Logo server [86].

Clustering of HLA alleles from distinct loci into separate groups has been previously demonstrated based on peptide binding pocket similarities [43], peptide-HLA contacts [42], surface electrostatics [44] and peptide binding repertoires [87]. Here, HLA-A, -B and–C alleles were clustered into distinct groups from a different perspective using local frustration data. Moreover, the structural energetics aspect provides an additional level of detail.

Unlike data used in previous studies, the SRFI may explain previously reported differences between HLA alleles in terms of complex stability and hence, the cell surface expression levels. Compared to HLA-A and HLA-B, lower cell surface expression levels were previously observed for HLA-C alleles [8891]. Moreover, peptide repertoires of HLA-C alleles are known to be more limited [88,92]. KYRV motif was also shown to be responsible for intrinsic instability of HLA-C [90]. Relatively higher frustration in HLA-C group, especially at binding pocket positions of 66, 74 and 80, is in line with these findings: higher frustration may introduce a destabilizing effect into the binding pockets, lead to restrictions in peptide binding, and thereby reduce cell surface expression levels. Among these positions, 66 is a member of the B pocket and increased frustration in this position may indicate a less stable B pocket in HLA-C. Likewise, position 80 is a member of F pocket and a similar effect may be valid for the F pocket as well. Around position 80, positions 79 and 82 also display high frustration as well, further contributing to F pocket frustration.

HLA-C additionally differs from HLA-B and HLA-A in terms of interactions with KIRs of NK cells [93,94]. The alleles in this group contain either the C1 or C2 epitopes with an arginine or lysine at position 80, respectively [95]. Moreover, HLA-C (and HLA-B46, which is clustered alongside HLA-C) is distinguished by a KYRV motif at positions 66, 67, 69 and 76 [92]. Previous crystal structures of HLA-C and KIR clearly show that the KIR contacts residues on α-1 helix of HLA near peptide C terminus [96,97]. High levels of frustration in this contact area of HLA-C (Fig 9B) may explain why HLA-C better interacts with the KIR than HLA-A and HLA-B alleles.

All in all, these results support the view that HLA-C is intrinsically less stable on the cell-surface via a post-translational mechanism [88,90]. This mechanism may simply involve a less than optimal packing in the binding groove of HLA-C. Our results may also explain the limited diversity of HLA-C peptide ligands: higher frustration in ligand binding sites may indicate a lower peptide binding capability.

Effect of peptide binding on local frustration profiles

Peptide-free (empty) MHC does not have a well-defined 3D structure (no crystal structure of a peptide-free MHC could be obtained so far). Instead, empty MHC continuously switches between alternative conformations until a sufficiently stable peptide-HLA interaction is achieved [21]. Peptide binding to HLA occurs via six pockets in the binding groove named A, B, C, D, E and F [22]. For many alleles, A, B and F pockets are decisive for peptide binding (A/B and F pockets binding N- and C-terminus of peptide ligands, respectively). Structural integrity of the F pocket of some alleles was previously shown to be more sensitive to peptide truncation in MD simulations than those of A and B pockets [41,98100]. This implies that contacts between peptide C-terminus and HLA F pocket are highly important for molecular stability. We reasoned that, by comparing peptide-free and -loaded MHC frustration profiles, the positions that depend on peptide contacts least/most for stability can be identified. Thus, we additionally quantified local frustration in peptide-free homology models, and calculated changes in SRFI upon binding of the peptide ligands to each allele. For each residue, we computed the difference between the median SRFI from peptide-loaded structures and the SRFI value in a single peptide-free structure of each allele. A single SRFI difference value was then obtained for each allele, and the distributions of these differences are shown in Fig 10. In line with the findings of previous studies, SRFI increase (hence reduction in frustration) upon peptide binding was highest near the F-pocket (positions 81, 84, 95, 142, and 147). Nevertheless, we should emphasize that our approach here involved a straightforward comparison with simple removal of peptides from the binding groove to generate peptide-free HLA. The number of peptides for each allele was also very limited. Further studies, possibly involving conformational differences between peptide-loaded and -free forms, are necessary for a more proper investigation of frustration changes upon peptide-binding in HLA.

Fig 10. Box-plots of SRFI change upon peptide binding.

Fig 10

Positions are ordered from left to right according to decreasing change in median SRFI values. Note that the plot only includes top 10 positions showing the highest increase in SRFI levels upon peptide binding.

Local frustration in the context of TAPBPR-HLA interaction

We also investigated the existence of a possible relationship between local frustration data and dependence of different HLA alleles on cofactors for structural assembly of the complex. For this purpose, we used the results of a recent study by Ilca et al. [101] as reference, where the authors reported a stronger preference of the TAPBPR for HLA-A molecules (especially those that belong to A2 and A24 HLA supertypes [87]) than for the HLA-B and HLA-C molecules. Here, local frustration may be relevant with respect two aspects of TAPBPR-HLA interaction. First, the local frustration profiles of peptide-binding pocket positions may describe relatively higher or lower stability of peptide binding grooves, and hence influence TAPBPR dependency. Second, the interface of HLA with the TAPBPR molecule may exhibit differential frustration between TAPBPR binders and non-binder alleles. Similar to the clustering analysis described above, we performed SRFI-based clustering of either peptide-binding pocket or TAPBPR interface residues of 30 top TAPBPR binder HLA allotypes reported by Ilca et al. The cluster maps are shown in Fig 11.

Fig 11. Clustering of allele-specific SRFI profiles based on.

Fig 11

(A) peptide-binding pocket and (B) TAPBPR interface residues of 30 TAPBPR binders and non-binders as reported by Ilca et al. [101]. Position 116 was also included in clustering shown in B. TAPBPR interface residues were based on a previously reported structure of a TAPBPR-MHC I complex. Top TAPBPR binders are highlighted with dashed-line rectangles.

We found that the top 8 TAPBPR binder HLA alleles, A*68:02, A*02:03, A*02:01, A*02:06, A*23:01, A*24:02 and A*24:03 were grouped together. They were further clustered into separate groups including (1) A*68:02, A*02:03, A*02:01, A*02:06 and (2) A*23:01, A*24:02, A*24:03. Ilca et al. highlighted the presence of H114/Y116 pair within the F pocket as a pre-requisite for strong TAPBPR interaction. Fig 11A indeed shows that the frustration level of 116 in TAPBPR binders is indeed distinct from those in non TAPBPR binders. On the other hand, non TAPBPR binders also featured different SRFI levels in this position, ranging from minimal frustration in several HLA-C and HLA-B alleles to high frustration in many other HLA-A alleles which featured D116. As such, there was no correlation between the SRFI level of position 116 and TAPBPR binding preference. There were several other positions within the binding pockets that exhibited distinct frustration levels in TAPBPR binders, including 97, 80, and 84. Y84 within the TAPBPR-MHC I complex was found to interact with the E102 of TAPBPR, instead of K146 of α-2-1 helix region and C-terminus of the bound peptide as in the unbound state [102]. Increased frustration at this position in TAPBPR binders, except for A*23:01, A*24:02, and A*24:03, may contribute to their higher binding tendency (Fig 11A and 11B). Interestingly, the increase in frustration was not due to an amino acid difference, as both TAPBPR binders and non-binders included Y in this position. This may be due to an allosteric effect from the bottom of the F pocket: the presence of Y at 116 may influence peptide-HLA contact patterns, which may cause a slight change in local frustration of Y84. Nevertheless, the effect on Y84 may be insufficient to not explain differences in TAPBPR association, as three TAPBPR binders (A*23:01, A*24:02, and A*24:03) exhibited lower frustration than other binders. On the other hand, all TAPBPR binders showed lower frustration for residue 128 within the binding interface (Fig 11B). Such perturbations may influence the bond-making potentials of interface residues, and may provide possible directions to investigate chaperone dependence using computational biophysics methods in future studies.

Material and methods

Sequence variation analysis

Aligned amino acid sequences of the α-1 and α-1 domains (i.e. the binding groove) of the HLA were retrieved from the IMGT/HLA database [27]. The sequences included in this file were converted to FASTA format for further analysis using Biopython [103]. Seq2Logo 2.0 web server was used to generate sequence logos to indicate conservation/variation at specific sites [104] using Shannon’s Information Content (IC) [105] as follows:

I=apalog2pa/qa

Here, I denotes information content, pa is the probability of observing amino acid a at the respective sequence position (calculated from input multiple sequence alignment) and qa is the pre-defined (background) probability of observing amino acid at the given position. An equal probability was used for each amino acid type (1/20).

The Evolutionary Trace (ET) method is an improvement over the classical IC approach described above [56,57]. In this method, the IE is calculated after constructing a phylogenetic tree of sequences and for each branch (node) of the tree. Then, the conservation/variation level at a given position is calculated using the following equation:

ρi=1+n=1N11ng=1n{apalog2pa/qa}

Here, N is the number of sequences in the alignment and N−1 is the number of possible nodes in the phylogenetic tree. The final score ρi is also termed “real-value Evolutionary Trace” score (rvET), and denotes the rank of a position (i) with respect to all other positions. Hence, lower rvET values indicate a higher evolutionary importance and vice-versa.

Prediction of peptide binders to class I HLA alleles and homology modeling of pMHC structures

Due to the limited number of HLA alleles with structure information at the Protein Data Bank (PDB), homology modeling was used to generate pMHC structures for 1436 alleles selected as follows. For selecting peptide ligands for homology modeling, a data-driven peptide-binding prediction tool (netMHCpan 3.0) was used [64] (stand-alone version of netMHCpan 3.0 was downloaded from http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?netMHCpan). First, 25000 random nonamer peptide sequences were generated using equal probability for each of the twenty naturally occurring amino acids at each peptide position. Then, binding affinities of all generated peptide sequences were predicted for each HLA allele recognized by netMHCpan 3.0 and with identified α-3 domain sequence. The results were then filtered to extract up to 10 peptides with the highest binding affinities (i.e. lowest binding free energies) and classified as strong binders by netMHCpan 3.0 for each allele. These peptides were then homology modelled in the context of HLA alleles as well as β2m using Modeller version 9.19 [106]. An X-ray structure of the HLA-B*53:01 allele was used as template (PDB ID: 1A1M).

Local frustration analysis

Local frustration analysis was performed on all produced homology models using frustratometer2 tool [67] (Stand-alone version available from https://github.com/gonzaparra/frustratometer2 was used). This tool uses the AWSEM (Associative Memory, Water Mediated, Structure and Energy Model) coarse-grained potential [107] to calculate residue-residue interaction energies. A sequence separation of 3 was used to calculate local amino acid densities, as defined by AWSEM. In addition to the interactions predicted by the AWSEM, long-range electrostatic interactions were also included—as offered by frustratometer2—using a Debye-Hückel potential:

VDH=Kelecti<jqiqjϵrerijlD

where qi and qj are charges of residues i and j, rij is the distance between residues i and j, ϵr is the dielectric constant of the medium (in vacuum this value is 1) and lD is the Debye-Hückel screening length, which is calculated using physiological values of temperature (25°C), dielectric constant of water (80) and ionic strength of water (0.1 M), yielding a lD of 10 Å. Here, it is possible to define an electronic strength of the protein system using a parameter called electrostatic constant (k):

k=Kelectϵr

Here, a k value of 4.15, which corresponds to an aqueous solution, was used.

Once the energies are computed using the AWSEM potential, frustratometer2 assesses the local frustration present in each residue-residue contact. Here, the contacts between two residues are compared to contacts in decoy structures or molten globule configurations. The tool can produce decoys by simultaneously mutating both residues to all other amino acids. A normalization is applied in order to compare the energies of the native structure to decoys. A “mutational frustration index” (Fijm) then captures how favorable the contacts of the residues present in the native structure to decoys as follows:

Fijm=Ei,jT,NEi,jT,U1/Nk=1n(Ei,jT,Uei,jT,U)2

where Ei,jT,N is the total energy of native protein and Ei,jT,U is the average energy of decoy structures. By changing the amino acid identity at both positions simultaneously, not only the contact between the respective two residues are changed, but those contacts made by the two residues with other residues are changed as well.

This index can also be calculated by changing the amino acid type of only one of the residues instead of applying simultaneous mutations at both positions. The index then represents how favorable the contacts made by a given amino acid are at a given position in the structure. When applied this way, the index is termed “Single Residue Frustration Index” (SRFI).

Frustration-based clustering of HLA class I alleles

Upon application of local frustration analysis on pMHC structures, SRFI profiles averaged over peptides per HLA allele are obtained. This is represented in the form of an “SRFI matrix” with rows corresponding to positions (residues) in pMHC structure and columns corresponding to HLA alleles. A column-wise agglomerative clustering was applied on this matrix to identify similarities and differences between different HLA alleles in terms of their local structural energetics. Distances between clusters were defined by the “single linkage” criteria, in which inter-cluster distances were taken as the shortest distance between any two points of a pair of clusters:

L(r,s)=min(D(xri,xsj))

where L(r,s) is the inter-cluster distance between clusters r and s and D(xri,xsj) is the distance between points xri and xsj of the two clusters. Here, each data point represents the SRFI profile of each HLA allele. The distance between data points were computed using Manhattan distance:

D(xri,xsj)=kn|xri,kxsj,k|

Here, the summation is performed over rows of data (k → n), which are actually positions in the pMHC structure.

Supporting information

S1 Table. HLA I alleles for which homology models were generated.

Clusters are also indicated with numbers (1, 2, or 3). HLA core alleles (taken from Robinson et al. [54]) are shown in bold.

(DOCX)

S2 Table. Median SRFI and rvET values of selected positions in HLA I peptide binding groove and α-3 domain as shown in Fig 5 and Fig 7.

The rows are ordered according to increasing rvET.

(DOCX)

Acknowledgments

The numerical calculations reported in this paper were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). OS would like to acknowledge access to computing resources at RTEU Samsung Center of Excellence (SMM) for data analysis.

Data Availability

All allele sequences and local frustration analysis results will be available from Dryad Digital Repository after acceptance with DOI: https://doi.org/10.5061/dryad.gmsbcc2hx.

Funding Statement

Support from Marmara University BAP FEN-A-101018-0526 to PO is acknowledged. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, et al. The interface of protein structure, protein biophysics, and molecular evolution Protein Science. Wiley-Blackwell; 2012. pp. 769–785. 10.1002/pro.2071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bastolla U, Dehouck Y, Echave J. What evolution tells us about protein physics, and protein physics tells us about evolution Current Opinion in Structural Biology. Elsevier Current Trends; 2017. pp. 59–66. 10.1016/j.sbi.2016.10.020 [DOI] [PubMed] [Google Scholar]
  • 3.Bahar I, Jernigan RL, Dill KA. Protein actions: principles and modeling. 2017. Available: https://www.crcpress.com/Protein-Actions-Principles-and-Modeling/Bahar-Jernigan-Dill/p/book/9780815341772 [Google Scholar]
  • 4.Kimura M, Ohta T. On some principles governing molecular evolution. Proc Natl Acad Sci U S A. 1974;71: 2848–52. Available: http://www.ncbi.nlm.nih.gov/pubmed/4527913 10.1073/pnas.71.7.2848 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Parra RG, Espada R, Verstraete N, Ferreiro DU. Structural and Energetic Characterization of the Ankyrin Repeat Protein Family. Orengo CA, editor. PLoS Comput Biol. 2015;11: e1004659 10.1371/journal.pcbi.1004659 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Abriata LA, Merijn ML, Tomatis PE. Sequence-function-stability relationships in proteins from datasets of functionally annotated variants: The case of TEM β-lactamases. FEBS Lett. 2012;586: 3330–3335. 10.1016/j.febslet.2012.07.010 [DOI] [PubMed] [Google Scholar]
  • 7.Dean AM, Neuhauser C, Grenier E, Golding GB. The Pattern of Amino Acid Replacements in α/β-Barrels. Mol Biol Evol. 2002;19: 1846–1864. 10.1093/oxfordjournals.molbev.a004009 [DOI] [PubMed] [Google Scholar]
  • 8.Shahmoradi A, Sydykova DK, Spielman SJ, Jackson EL, Dawson ET, Meyer AG, et al. Predicting Evolutionary Site Variability from Structure in Viral Proteins: Buriedness, Packing, Flexibility, and Design. J Mol Evol. 2014;79: 130–142. 10.1007/s00239-014-9644-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Shahmoradi A, Wilke CO. Dissecting the roles of local packing density and longer-range effects in protein sequence evolution. Proteins Struct Funct Bioinforma. 2016;84: 841–854. 10.1002/prot.25034 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys. 2017;46: 85–103. 10.1146/annurev-biophys-070816-033819 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Echave J, Jackson EL, Wilke CO. Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites. Phys Biol. 2015;12: 025002 10.1088/1478-3975/12/2/025002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wilke CO. Bringing Molecules Back into Molecular Evolution. Kosakovsky Pond SL, editor. PLoS Comput Biol. 2012;8: e1002572 10.1371/journal.pcbi.1002572 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface. 2014;11: 20140419–20140419. 10.1098/rsif.2014.0419 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Fay JC, Wyckoff GJ, Wu C-I. Positive and Negative Selection on the Human Genome. Genetics. 2001;158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites Nature Reviews Genetics Nature Publishing Group; February 19, 2016. pp. 109–121. 10.1038/nrg.2015.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Meyer D, Vitor VR, Bitarello BD, Débora DY, Nunes K. A genomic perspective on HLA evolution Immunogenetics. Springer; 2018. pp. 5–27. 10.1007/s00251-017-1017-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Crux NB, Elahi S. Human Leukocyte Antigen (HLA) and Immune Regulation: How Do Classical and Non-Classical HLA Alleles Modulate Immune Response to Human Immunodeficiency Virus and Hepatitis C Virus Infections? Front Immunol. 2017;8: 832 10.3389/fimmu.2017.00832 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Vandiedonck C, Knight JC. The human Major Histocompatibility Complex as a paradigm in genomics research. Brief Funct Genomic Proteomic. 2009;8: 379–94. 10.1093/bfgp/elp010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Martínez-Borra J, López-Larrea C. The emergence of the Major Histocompatilibility Complex. Adv Exp Med Biol. 2012;738: 277–89. 10.1007/978-1-4614-1680-7_16 [DOI] [PubMed] [Google Scholar]
  • 20.Kurimoto E, Kuroki K, Yamaguchi Y, Yagi-Utsumi M, Igaki T, Iguchi T, et al. Structural and functional mosaic nature of MHC class I molecules in their peptide-free form. Mol Immunol. 2013;55: 393–399. 10.1016/j.molimm.2013.03.014 [DOI] [PubMed] [Google Scholar]
  • 21.Theodossis A. On the trail of empty MHC class-I. Mol Immunol. 2013;55: 131–4. 10.1016/j.molimm.2012.10.012 [DOI] [PubMed] [Google Scholar]
  • 22.Saper MA, Bjorkman PJ, Wiley DC. Refined structure of the human histocompatibility antigen HLA-A2 at 2.6 Å resolution. J Mol Biol. 1991;219: 277–319. 10.1016/0022-2836(91)90567-p [DOI] [PubMed] [Google Scholar]
  • 23.Liu J, Gao GF. Major Histocompatibility Complex: Interaction with Peptides. eLS. Chichester, UK: John Wiley & Sons, Ltd; 2011. 10.1002/9780470015902.a0000922.pub2 [DOI] [Google Scholar]
  • 24.Charles A Janeway J, Travers P, Walport M, Shlomchik MJ. The major histocompatibility complex and its functions. Garland Science; 2001. Available: http://www.ncbi.nlm.nih.gov/books/NBK27156/ [Google Scholar]
  • 25.Rangarajan S, Mariuzza RA. Natural Killer Cell Receptors. Struct Biol Immunol. 2018; 101–125. 10.1016/B978-0-12-803369-2.00004-8 [DOI] [Google Scholar]
  • 26.Rossjohn J, Gras S, Miles JJ, Turner SJ, Godfrey DI, McCluskey J. T Cell Antigen Receptor Recognition of Antigen-Presenting Molecules. Annu Rev Immunol. 2015;33: 169–200. 10.1146/annurev-immunol-032414-112334 [DOI] [PubMed] [Google Scholar]
  • 27.Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SGE. The IPD and IMGT/HLA database: Allele variant databases. Nucleic Acids Res. 2015;43: D423–D431. 10.1093/nar/gku1161 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Spurgin LG, Richardson DS. How pathogens drive genetic diversity: MHC, mechanisms and misunderstandings. Proc R Soc B Biol Sci. 2010;277: 979–988. 10.1098/rspb.2009.2084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Sommer S. The importance of immune gene variability (MHC) in evolutionary ecology and conservation Frontiers in Zoology. BioMed Central; 2005. p. 16 10.1186/1742-9994-2-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Parham P. Function and polymorphism of human leukocyte antigen-A,B,C molecules. Am J Med. 1988;85: 2–5. 10.1016/0002-9343(88)90369-5 [DOI] [PubMed] [Google Scholar]
  • 31.Hughes AL, Nei M. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature. 1988;335: 167–170. 10.1038/335167a0 [DOI] [PubMed] [Google Scholar]
  • 32.Sette A, Sidney J. Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism. Immunogenetics. 1999;50: 201–212. 10.1007/s002510050594 [DOI] [PubMed] [Google Scholar]
  • 33.Dendrou CA, Petersen J, Rossjohn J, Fugger L. HLA variation and disease Nature Reviews Immunology. Nature Publishing Group; 2018. pp. 325–339. 10.1038/nri.2017.143 [DOI] [PubMed] [Google Scholar]
  • 34.Wieczorek M, Abualrous ET, Sticht J, Álvaro-Benito M, Stolzenberg S, Noé F, et al. Major histocompatibility complex (MHC) class I and MHC class II proteins: Conformational plasticity in antigen presentation. Frontiers in Immunology. 2017. p. 292 10.3389/fimmu.2017.00292 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Rossjohn J, Gras S, Miles JJ, Turner SJ, Godfrey DI, McCluskey J. T Cell Antigen Receptor Recognition of Antigen-Presenting Molecules. Annu Rev Immunol. 2015;33: 169–200. 10.1146/annurev-immunol-032414-112334 [DOI] [PubMed] [Google Scholar]
  • 36.Antunes DA, Rigo MM, Freitas M V., Mendes MFA, Sinigaglia M, Lizée G, et al. Interpreting T-Cell Cross-reactivity through Structure: Implications for TCR-Based Cancer Immunotherapy. Front Immunol. 2017;8: 1–16. 10.3389/fimmu.2017.00001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Li Y, Mariuzza RA. Structural basis for recognition of cellular and viral ligands by NK cell receptors. Front Immunol. 2014;5: 123 10.3389/fimmu.2014.00123 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Blees A, Januliene D, Hofmann T, Koller N, Schmidt C, Trowitzsch S, et al. Structure of the human MHC-I peptide-loading complex. Nature. 2017;551: 525–528. 10.1038/nature24627 [DOI] [PubMed] [Google Scholar]
  • 39.Fisette O, Wingbermühle S, Tampé R, Schäfer L V. Molecular mechanism of peptide editing in the tapasin-MHC I complex. Sci Rep. 2016;6: 19085 10.1038/srep19085 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Hafstrand I, Sayitoglu EC, Apavaloaei A, Josey BJ, Sun R, Han X, et al. Successive crystal structure snapshots suggest the basis for MHC class i peptide loading and editing by tapasin. Proc Natl Acad Sci U S A. 2019;116: 5055–5060. 10.1073/pnas.1807656116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Fisette O, Wingbermühle S, Schäfer L V. Partial Dissociation of Truncated Peptides Influences the Structural Dynamics of the MHCI Binding Groove. Front Immunol. 2017;8: 408 10.3389/fimmu.2017.00408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Harjanto S, Ng LFP, Tong JC. Clustering HLA class I superfamilies using structural interaction patterns. Kobe B, editor. PLoS One. 2014;9: e86655 10.1371/journal.pone.0086655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mukherjee S, Warwicker J, Chandra N. Deciphering complex patterns of class-I HLA-peptide cross-reactivity via hierarchical grouping. Immunol Cell Biol. 2015;93: 522–32. 10.1038/icb.2015.3 [DOI] [PubMed] [Google Scholar]
  • 44.Mumtaz S, Nabney IT, Flower DR. Scrutinizing human MHC polymorphism: Supertype analysis using Poisson-Boltzmann electrostatics and clustering. J Mol Graph Model. 2017;77: 130–136. 10.1016/j.jmgm.2017.07.033 [DOI] [PubMed] [Google Scholar]
  • 45.Onuchic JN, Luthey-Schulten Z, Wolynes PG. THEORY OF PROTEIN FOLDING: The Energy Landscape Perspective. Annu Rev Phys Chem. 1997;48: 545–600. 10.1146/annurev.physchem.48.1.545 [DOI] [PubMed] [Google Scholar]
  • 46.Ferreiro DU, Komives EA, Wolynes PG. Frustration in biomolecules Quarterly Reviews of Biophysics. Cambridge University Press; 2014. pp. 285–363. 10.1017/S0033583514000092 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ferreiro DU, Komives EA, Wolynes PG. Frustration, function and folding. Curr Opin Struct Biol. 2018;48: 68–73. 10.1016/j.sbi.2017.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ferreiro DU, Hegler JA, Komives EA, Wolynes PG. Localizing frustration in native proteins and protein assemblies. Proc Natl Acad Sci U S A. 2007;104: 19819–24. 10.1073/pnas.0709915104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Freiberger MI, Brenda Guzovsky A, Wolynes PG, Gonzalo Parra R, Ferreiro DU. Local frustration around enzyme active sites. Proc Natl Acad Sci U S A. 2019;116: 4037–4043. 10.1073/pnas.1819859116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Onuchic JN, Luthey-Schulten Z, Wolynes PG. THEORY OF PROTEIN FOLDING: The Energy Landscape Perspective. Annu Rev Phys Chem. 1997;48: 545–600. 10.1146/annurev.physchem.48.1.545 [DOI] [PubMed] [Google Scholar]
  • 51.Tripathi S, Waxham MN, Cheung MS, Liu Y. Lessons in Protein Design from Combined Evolution and Conformational Dynamics. Sci Rep. 2015;5: 14259 10.1038/srep14259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Tripathi S, Wang Q, Zhang P, Hoffman L, Waxham MN, Cheung MS. Conformational frustration in calmodulin-target recognition. J Mol Recognit. 2015;28: 74–86. 10.1002/jmr.2413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Espada R, Ferreiro DU, Parra R G. The Design of Repeat Proteins: Stability Conflicts with Functionality. Biochem Mol Biol J. 2017;03 10.21767/2471-8084.100031 [DOI] [Google Scholar]
  • 54.Robinson J, Guethlein LA, Cereb N, Yang SY, Norman PJ, Marsh SGE, et al. Distinguishing functional polymorphism from random variation in the sequences of >10,000 HLA-A, -B and -C alleles. Keating BJ, editor. PLoS Genet. 2017;13: e1006862 10.1371/journal.pgen.1006862 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.van Deutekom HWM, Keşmir C. Zooming into the binding groove of HLA molecules: which positions and which substitutions change peptide binding most? Immunogenetics. 2015;67: 425–36. 10.1007/s00251-015-0849-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Mihalek I, Reš I, Lichtarge O. A Family of Evolution-Entropy Hybrid Methods for Ranking Protein Residues by Importance. J Mol Biol. 2004;336: 1265–1282. 10.1016/j.jmb.2003.12.078 [DOI] [PubMed] [Google Scholar]
  • 57.Wilkins A, Erdin S, Lua R, Lichtarge O. Evolutionary trace for prediction and redesign of protein functional sites. Methods Mol Biol. 2012;819: 29–42. 10.1007/978-1-61779-465-0_3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Wilkins AD, Lua R, Erdin S, Ward RM, Lichtarge O. Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation. Protein Sci. 2010;19: 1296–1311. 10.1002/pro.406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, et al. Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol. 2002;316: 139–154. 10.1006/jmbi.2001.5327 [DOI] [PubMed] [Google Scholar]
  • 60.Mukherjee S, Warshel A. Electrostatic origin of the mechanochemical rotary mechanism and the catalytic dwell of F1-ATPase. Proc Natl Acad Sci U S A. 2011;108: 20550–5. 10.1073/pnas.1117024108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Omasits U, Knapp B, Neumann M, Steinhauser O, Kobler R, Schreiner W, et al. Analysis of key parameters for molecular dynamics of pMHC molecules. Mol Simul. 2008;34: 781–793. 10.1080/08927020802256298 [DOI] [Google Scholar]
  • 62.Serçinoğlu O, Ozbek P. Computational characterization of residue couplings and micropolymorphism-induced changes in the dynamics of two differentially disease-associated human MHC class-I alleles. J Biomol Struct Dyn. 2018;36: 724–740. 10.1080/07391102.2017.1295884 [DOI] [PubMed] [Google Scholar]
  • 63.Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2014;43: D405–D412. 10.1093/nar/gku938 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Nielsen M, Andreatta M. NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets. Genome Med. 2016;8: 33 10.1186/s13073-016-0288-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Menegatti Rigo M, Amaral Antunes D, Vaz de Freitas M, Fabiano de Almeida Mendes M, Meira L, Sinigaglia M, et al. DockTope: a Web-based tool for automated pMHC-I modelling. Sci Rep. 2015;5: 18413 10.1038/srep18413 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Kyeong H-H, Choi Y, Kim H-S. GradDock: rapid simulation and tailored ranking functions for peptide-MHC Class I docking. Bioinformatics. 2018;34: 469–476. 10.1093/bioinformatics/btx589 [DOI] [PubMed] [Google Scholar]
  • 67.Parra RG, Schafer NP, Radusky LG, Tsai MY, Guzovsky AB, Wolynes PG, et al. Protein Frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics. Nucleic Acids Res. 2016;44: W356–W360. 10.1093/nar/gkw304 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Warburton RJ, Matsui M, Rowland-Jones SL, Gammon MC, Katzenstein GE, Wei T, et al. Mutation of the α2 domain disulfide bridge of the class I molecule HLA-A*0201 Effect on maturation and peptide presentation. Hum Immunol. 1994;39: 261–271. 10.1016/0198-8859(94)90269-0 [DOI] [PubMed] [Google Scholar]
  • 69.Dib L, Salamin N, Gfeller D. Polymorphic sites preferentially avoid co-evolving residues in MHC class I proteins. PLoS Comput Biol. 2018;14: e1006188 10.1371/journal.pcbi.1006188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Haliloglu T, Gul A, Erman B. Predicting important residues and interaction pathways in proteins using gaussian network model: Binding and stability of HLA proteins. Nussinov R, editor. PLoS Comput Biol. 2010;6: 20 10.1371/journal.pcbi.1000845 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Courtney AH, Lo WL, Weiss A. TCR Signaling: Mechanisms of Initiation and Propagation. Trends in Biochemical Sciences. 1 February 2017: 108–123. 10.1016/j.tibs.2017.11.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.La Gruta NL, Gras S, Daley SR, Thomas PG, Rossjohn J. Understanding the drivers of MHC restriction of T cell receptors. Nature Reviews Immunology. 10 April 2018: 1–12. 10.1038/s41577-018-0007-5 [DOI] [PubMed] [Google Scholar]
  • 73.Zinkernagel RM, Doherty PC. Immunological surveillance against altered self components by sensitised T lymphocytes in lymphocytes choriomeningitis. Nature. 1974;251: 547–548. 10.1038/251547a0 [DOI] [PubMed] [Google Scholar]
  • 74.Rossjohn J, Gras S, Miles JJ, Turner SJ, Godfrey DI, McCluskey J. T Cell Antigen Receptor Recognition of Antigen-Presenting Molecules. Annu Rev Immunol. 2015;33: 169–200. 10.1146/annurev-immunol-032414-112334 [DOI] [PubMed] [Google Scholar]
  • 75.Beringer DX, Kleijwegt FS, Wiede F, van der Slik AR, Loh KL, Petersen J, et al. T cell receptor reversed polarity recognition of a self-antigen major histocompatibility complex. Nat Immunol. 2015;16: 1153–1161. 10.1038/ni.3271 [DOI] [PubMed] [Google Scholar]
  • 76.Gras S, Chadderton J, Del Campo CM, Farenc C, Wiede F, Josephs TM, et al. Reversed T Cell Receptor Docking on a Major Histocompatibility Class I Complex Limits Involvement in the Immune Response. Immunity. 2016;45: 749–760. 10.1016/j.immuni.2016.09.007 [DOI] [PubMed] [Google Scholar]
  • 77.Adams JJ, Narayanan S, Liu B, Birnbaum ME, Kruse AC, Bowerman NA, et al. T Cell Receptor Signaling Is Limited by Docking Geometry to Peptide-Major Histocompatibility Complex. Immunity. 2011;35: 681–693. 10.1016/j.immuni.2011.09.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Garcia KC, Adams JJ, Feng D, Ely LK. The molecular basis of TCR germline bias for MHC is surprisingly simple. Nat Immunol. 2009;10: 143–147. 10.1038/ni.f.219 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Marrack P, Scott-Browne JP, Dai S, Gapin L, Kappler JW. Evolutionarily Conserved Amino Acids That Control TCR-MHC Interaction. Annu Rev Immunol. 2008;26: 171–203. 10.1146/annurev.immunol.26.021607.090421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Tynan FE, Burrows SR, Buckle AM, Clements CS, Borg NA, Miles JJ, et al. T cell receptor recognition of a “super-bulged” major histocompatibility complex class I-bound peptide. Nat Immunol. 2005;6: 1114–1122. 10.1038/ni1257 [DOI] [PubMed] [Google Scholar]
  • 81.Fleischmann G, Fisette O, Thomas C, Wieneke R, Tumulka F, Schneeweiss C, et al. Mechanistic Basis for Epitope Proofreading in the Peptide-Loading Complex. J Immunol. 2015;195: 4503–4513. 10.4049/jimmunol.1501515 [DOI] [PubMed] [Google Scholar]
  • 82.Thomas C, Tampé R. Structure of the TAPBPR–MHC I complex defines the mechanism of peptide loading and editing. Science (80-). 2017;358: 1060–1064. 10.1126/science.aao6001 [DOI] [PubMed] [Google Scholar]
  • 83.Jiang J, Natarajan K, Boyd LF, Morozov GI, Mage MG, Margulies DH. Crystal structure of a TAPBPR–MHC I complex reveals the mechanism of peptide editing in antigen presentation. Science (80-). 2017;358: 1064–1068. 10.1126/science.aao5154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Natarajan K, Jiang J, Margulies DH. Structural aspects of chaperone-mediated peptide loading in the MHC-I antigen presentation pathway. Crit Rev Biochem Mol Biol. 2019;54: 164–173. 10.1080/10409238.2019.1610352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Hilton HG, McMurtrey CP, Han AS, Djaoud Z, Guethlein LA, Blokhuis JH, et al. The Intergenic Recombinant HLA-B*46:01 Has a Distinctive Peptidome that Includes KIR2DL3 Ligands. Cell Rep. 2017;19: 1394–1405. 10.1016/j.celrep.2017.04.059 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Vacic V, Iakoucheva LM, Radivojac P. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics. 2006;22: 1536–1537. 10.1093/bioinformatics/btl151 [DOI] [PubMed] [Google Scholar]
  • 87.Sidney J, Peters B, Frahm N, Brander C, Sette A. HLA class I supertypes: a revised and updated classification. BMC Immunol. 2008;9: 1 10.1186/1471-2172-9-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Neisig A, Melief CJ, Neefjes J. Reduced cell surface expression of HLA-C molecules correlates with restricted peptide binding and stable TAP interaction. J Immunol. 1998;160: 171–9. Available: http://www.ncbi.nlm.nih.gov/pubmed/9551969 [PubMed] [Google Scholar]
  • 89.Neefjes JJ, Ploegh HL. Allele and locus‐specific differences in cell surface expression and the association of HLA class I heavy chain with β2‐microglobulin: differential effects of inhibition of glycosylation on class I subunit association. Eur J Immunol. 1988;18: 801–810. 10.1002/eji.1830180522 [DOI] [PubMed] [Google Scholar]
  • 90.Sibilio L, Martayan A, Setini A, Monaco E Lo, Tremante E, Butler RH, et al. A single bottleneck in HLA-C assembly. J Biol Chem. 2008;283: 1267–1274. 10.1074/jbc.M708068200 [DOI] [PubMed] [Google Scholar]
  • 91.Apps R, Meng Z, Del Prete GQ, Lifson JD, Zhou M, Carrington M. Relative Expression Levels of the HLA Class-I Proteins in Normal and HIV-Infected Cells. J Immunol. 2015;194: 3594–3600. 10.4049/jimmunol.1403234 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Zemmour J, Parham P. Distinctive polymorphism at the HLA-C locus: implications for the expression of HLA-C. J Exp Med. 1992;176: 937–950. 10.1084/jem.176.4.937 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Parham P, Moffett A. Variable NK cell receptors and their MHC class i ligands in immunity, reproduction and human evolution Nature Reviews Immunology. Nature Publishing Group; 2013. pp. 133–144. 10.1038/nri3370 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Winter CC, Long E. A single amino acid in the p58 killer cell inhibitory receptor controls the ability of natural killer cells to discriminate between the two groups of HLA-C allotypes. J Immunol. 1997;142: 142–144. 10.4049/jimmunol.166.12.7260 [DOI] [PubMed] [Google Scholar]
  • 95.Winter CC, Long EO. A Single Amino Acid in the p58 Killer Cell Inhibitory Receptor Controls the Ability of Natural Killer Cells to Discriminate between the Two Groups of HLA-C Allotypes. J Immunol. 1997;158: 4026–4028. Available: http://www.ncbi.nlm.nih.gov/pubmed/9126959 [PubMed] [Google Scholar]
  • 96.Boyington JC, Sun PD. A structural perspective on MHC class I recognition by killer cell immunoglobulin-like receptors Molecular Immunology. Pergamon; 2002. pp. 1007–1021. 10.1016/s0161-5890(02)00030-5 [DOI] [PubMed] [Google Scholar]
  • 97.Fan QR, Long EO, Wiley DC. Crystal structure of the human natural killer cell inhibitory receptor KIR2DL1–HLA-Cw4 complex. Nat Immunol. 2001;2: 452–460. 10.1038/87766 [DOI] [PubMed] [Google Scholar]
  • 98.Abualrous ET, Saini SK, Ramnarayan VR, Ilca FT, Zacharias M, Springer S. The Carboxy Terminus of the Ligand Peptide Determines the Stability of the MHC Class I Molecule H-2Kb: A Combined Molecular Dynamics and Experimental Study. PLoS One. 2015;10: e0135421 10.1371/journal.pone.0135421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Abualrous ET, Fritzsche S, Hein Z, Al-Balushi MS, Reinink P, Boyle LH, et al. F pocket flexibility influences the tapasin dependence of two differentially disease-associated MHC Class I proteins. Eur J Immunol. 2015;45: 1248–57. 10.1002/eji.201445307 [DOI] [PubMed] [Google Scholar]
  • 100.Hein Z, Uchtenhagen H, Abualrous ET, Saini SK, Janßen L, Van Hateren A, et al. Peptide-independent stabilization of MHC class I molecules breaches cellular quality control. J Cell Sci. 2014;127: 2885–97. 10.1242/jcs.145334 [DOI] [PubMed] [Google Scholar]
  • 101.Ilca FT, Drexhage LZ, Brewin G, Peacock S, Boyle LH. Distinct Polymorphisms in HLA Class I Molecules Govern Their Susceptibility to Peptide Editing by TAPBPR. Cell Rep. 2019;29: 1621–1632.e3. 10.1016/j.celrep.2019.09.074 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Jiang J, Natarajan K, Boyd LF, Morozov GI, Mage MG, Margulies DH. Crystal structure of a TAPBPR-MHC I complex reveals the mechanism of peptide editing in antigen presentation. Science. 2017;358: 1064–1068. 10.1126/science.aao5154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25: 1422–3. 10.1093/bioinformatics/btp163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Thomsen MCF, Nielsen M. Seq2Logo: A method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion. Nucleic Acids Res. 2012;40: W281–W287. 10.1093/nar/gks469 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Shannon CE. A Mathematical Theory of Communication. Bell Syst Tech J. 1948;27: 379–423. 10.1002/j.1538-7305.1948.tb01338.x [DOI] [Google Scholar]
  • 106.Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. Curr Protoc Bioinformatics. 2014;47: 5.6.1–5.6.32. 10.1002/0471250953.bi0506s47 [DOI] [PubMed] [Google Scholar]
  • 107.Davtyan A, Schafer NP, Zheng W, Clementi C, Wolynes PG, Papoian GA. AWSEM-MD: Protein structure prediction using coarse-grained physical potentials and bioinformatically based local structure biasing. J Phys Chem B. 2012;116: 8494–8503. 10.1021/jp212541y [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Antony Nicodemus Antoniou

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

4 Feb 2020

PONE-D-19-35019

Sequence-structure-function relationships in class I MHC: a local frustration perspective

PLOS ONE

Dear Dr Ozbek

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

All reviewers were of the opinion that the manuscript was of interest to the field of MHC biology but have raised points that need to be addressed. Please address all the point raised with especial consideration to those raised by reviewer 3.

We would appreciate receiving your revised manuscript by Mar 20 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Antony Nicodemus Antoniou, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

1. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Partly

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

Reviewer #3: I Don't Know

Reviewer #4: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors investigated the frustration in sequences and models of HLA-A, B and C alleles to understand the relationship between sequence, structure and function. They found higher frustration near TCR, KIR and tapasin contact positions on alpha-1 and alpha-2 helices and suggested is as a basis for the interaction. The authors also determined frustration of the complex with and without peptide. They found that the reduction in frustration upon peptide binding was highest near the F-pocket, confirming the importance of F pocket for stable peptide binding. They also found higher frustration in the HLA-C group, compared with HLA-A and B groups, especially at binding pocket positions of 66, 74 and 80 and proposed that it may affect peptide binding by HLA-C, resulting in lower stability and reduced cell surface expression. This is a comprehensive analysis of frustration in MHC class I alleles that adds to our understanding of the function of micropolymorphisms in MHC class I molecules.

Reviewer #2: The manuscript by Sercinoglu and Ozbek have described a comprehensive study using local frustration analysis to dissect the structural-functional relationship of the highly polymorphic MHC class I molecules. The study appears technically sound and adds further details to the structural characteristics of MHC class I molecules. In addition the analysis does appear to confirm much published experimental work such as the sequences which are in high frustration areas being associated with various interaction sites such TCR, KIR and tapasin, whilst at the same time presenting data across thousands of alleles and subtypes.

There are a few issues that I feel the authors need to address to clarify their manuscript.

Figure 1; a more detailed figure legend is required which illustrates the exact molecule and where the structure was obtained from.

Figure 2; to the non-expert it would be useful to generate a table highlighting the binding groove positions and the variable, hypervariable and conserved positions, especially those listed within the text. Also a table highlighting perhaps those positions with the highest and lowest rvET scores would be useful and would extract some of the pertinent findings from the analysis. Such tables may also complement Figure 3.

Figure 3; on the version I have the numerals are not clear as with most figures.

Figure 5; could the authors highlight the HLA-B*46 group of alleles within the figure as it is not obvious to the non-expert as to what alleles are being referred to.

Figure 6; a more extensive figure legend and explanation of the data within the text is recommended.

There were a few minor typographical and grammatical errors which are highlighted below;

Tapasin - small t

Line 154- 'tree' not 'three'

Line 170- sentence needs changing

Line 210-a brief definition of what the authors mean by 'energically active'

Line 211-212- sentence needs changing

Line 303- remove an ‘also’

Reviewer #3: Review of the manuscript “ Sequence-structure-function relationships in class I MHC: a local frustration perspective” by Onur Serçinoğlu and Pemra Ozbek

In this manuscript, authors Sercinoglu and Ozbek are reporting on a comparative analysis of various HLA Class I alleles, based on amino acid sequence variation, calculated local energetic frustration and evolutionary importance per each position within alpha 1 and alpha 2 domains. They claim that lesser variation at certain positions correlates with low local frustration. Such residues are likely to contribute to the formation of a conserved “MHC Class I fold” structure and consequently carry high evolutionary importance. Many other residues present intermediate to high local frustration and suggested by the authors to be involved in binding to peptide ligands, intracellular chaperones as well as receptor molecules from immune cells that survey antigen presentation by MHC Class I.

I think that this study is a great attempt to compare a vast number of HLA I sequences and to extract valuable information that offers a great potential for generating testable hypotheses that will allow investigation of MHC Class I molecules by biochemical, immunological or cell biological methods. Nevertheless, I have some major concerns, which I think should be addressed by the authors before the manuscript can be published.

1. Authors acknowledge that HLA Class I molecules are the most polymorphic set of gene products and the polymorphisms are concentrated in alpha 1 and alpha 2 domains. This is reflected in the analysis of sequence variations within those domains (Figure 2). Authors mention that the positions with high sequence variation also yield high rvET scores and vice versa. The findings propose that the two analyses of sequence variation and evolutionary trace correlate. It is not clear which residues qualify as evolutionarily conserved/important. Apart from the two cysteines that are essential for folding MHC Class I molecules, which other residues are important for forming the conserved “MHC fold”? Authors should explain clearly how their data supports their reasoning of MHC Class I structure conservation through evolution and which positions are required for folding and which others are required for MHC I function/interactions with other proteins.

2. Based on the assumptions of the authors, lower level of variance and higher evolutionary conservation is expected to be seen among alpha 3 domains of HLA I. Alpha 3 domain sequence comparisons should be included to be able to demonstrate the proposed sequence conservation during evolution for residues that are involved in the core architecture of MHC Class I.

3. For the first round of analyses performed and shown in Figure 2, I suggest that the authors compare HLA-A, HLA-B and HLA-C allele sequences within three separate groups (they only apply this separation to local frustration analysis), as those allotypes are products of three different gene loci and sequence evolution might have occurred in different positions in time. This might increase the resolution of the sequence variation analysis greatly.

4. Authors mention high levels of local frustration for residues that might be contacting TCR or tapasin. While it is conceivable that parts of MHC Class I that interact with TCR are still mobile and energetically active, it is not clear why residues that are on the tapasin interface are highly frustrated. Authors model peptide-bound MHC Class I structures where, residues contacting peptides show lesser or neutral frustration profile. After acquiring a strongly bound ligand, MHC Class I molecules are not expected to interact with tapasin so it remains to be clarified why peptide-loaded MHC Class I molecules remain frustrated in alpha2-1 region.

5. One way of calculating the contribution of peptide binding to reducing local frustration would be comparing the local frustration of the residues in alpha 1 and alpha 2 domains in the absence versus the presence of peptide in the models. Authors perform this analysis and show some results in Figure 6 but it is not possible to understand the changes in SRFI values are statistically significant. Also, the choice of residues is not clearly explained, why are residues such as 114 and 116 not included in the analysis?

Minor points:

1. The abbreviation of beta 2-microglobulin should not include any dashes (β2m).

2. In line 75, the term “HLA chain of MHC” should be revised. It is not clear what is meant by this phrase.

3. In line 84, term “protein alleles” should be revised as allelic forms of genes but proteins exist.

4. Although it is conceivable that the abbreviations “HLA” or “HLA I” or “MHC” are used by the authors to indicate HLA Class I, I advise the consistent use of one single abbreviation for the proteins.

a. Clustering data (fig5) already indicates that alleles from different loci have distinct properties and frustration profiles.

5. It is not clear in Figure 4, which position the SRFI value is derived from. Why is there a decreasing trend?

6. Figure 5A – labels on top and left of the figure are missing.

Reviewer #4: This study by Sercinoglu and Ozbek takes an interesting and unique theoretical approach based on the analysis of single residue frustration analysis (SRFI), to address the structure-function question relating to the highly diverse and evolutionarily non-conserved MHC-I protein. The paper is thought-provoking and reasonably well presented, but the following points need to be addressed before it is ready for publication:

1. In order to improve the access of this article to a more biological readership, the authors need to clarify what local frustration analysis is, what it measures and what its significance is with respect to MHC diversification during evolution. This clarification should begin within the body of the abstract (expand on ll22), and continue in the introduction and discussion.

2. A significant yet unavoidable limitation of the study is the large number of homology-modelled structures. More attention should be paid to this limitation in the text.

3. Around l176: Is it likely that higher frustration in the binding pockets (especially but not exclusively the F pocket) leads to more dependence on cofactors for assembly? That is to say, is there a correlation between frustration index and tapasin or TAPBPR dependence among the different alleles? This issue should be discussed.

4. Figure 6: Similarly, when the authors undertookSRFI on “empty” MHC alleles, they found frustration to increase. Does the degree of change in SRFI correlate with tapasin or TAPBPR dependence? This issue should be discussed.

Minor points

5. Ll30: MHC refers to the genetic locus, not the “protein complex”

6. Ll31 “of genetics” not “on genetics”

7. Ll170 necessary [to] model

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes: Zeynep Hein

Reviewer #4: Yes: Timothy J. Elliott

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 May 18;15(5):e0232849. doi: 10.1371/journal.pone.0232849.r002

Author response to Decision Letter 0


1 Apr 2020

We are grateful for the time and energy you expanded on our behalf. In the following sections you will find our responses to each of your points and suggestions.

RESPONSES TO REVIEWER #1

Reviewer #1: The authors investigated the frustration in sequences and models of HLA-A, B and C alleles to understand the relationship between sequence, structure and function. They found higher frustration near TCR, KIR and tapasin contact positions on alpha-1 and alpha-2 helices and suggested is as a basis for the interaction. The authors also determined frustration of the complex with and without peptide. They found that the reduction in frustration upon peptide binding was highest near the F-pocket, confirming the importance of F pocket for stable peptide binding. They also found higher frustration in the HLA-C group, compared with HLA-A and B groups, especially at binding pocket positions of 66, 74 and 80 and proposed that it may affect peptide binding by HLA-C, resulting in lower stability and reduced cell surface expression. This is a comprehensive analysis of frustration in MHC class I alleles that adds to our understanding of the function of micropolymorphisms in MHC class I molecules.

We appreciate these positive comments by the Reviewer regarding the significance of our study. We are also grateful for the time and energy expanded on our behalf.

RESPONSES TO REVIEWER #2

Reviewer #2: The manuscript by Sercinoglu and Ozbek have described a comprehensive study using local frustration analysis to dissect the structural-functional relationship of the highly polymorphic MHC class I molecules. The study appears technically sound and adds further details to the structural characteristics of MHC class I molecules. In addition the analysis does appear to confirm much published experimental work such as the sequences which are in high frustration areas being associated with various interaction sites such TCR, KIR and tapasin, whilst at the same time presenting data across thousands of alleles and subtypes.

There are a few issues that I feel the authors need to address to clarify their manuscript.

Figure 1; a more detailed figure legend is required which illustrates the exact molecule and where the structure was obtained from.

We’d like to apologize for any confusion this may have caused. Necessary information is now provided in the legend.

Figure 2; to the non-expert it would be useful to generate a table highlighting the binding groove positions and the variable, hypervariable and conserved positions, especially those listed within the text. Also a table highlighting perhaps those positions with the highest and lowest rvET scores would be useful and would extract some of the pertinent findings from the analysis. Such tables may also complement Figure 3.

We’d like to thank Reviewer for this suggestion. We have constructed a Supplementary Table (Suppl. Table 2), and referred to this Table for convenience in the revised manuscript text. This table includes rvET and SRFI (median) values for binding groove as well as the alpha-3 domain residues for which data is shown in Fig 5 and Fig 7 in the revised manuscript.

Figure 3; on the version I have the numerals are not clear as with most figures.

We’d like to apologize if this caused any invonvenience. The resolution of all the Figures were increased to overcome this issue.

Figure 5; could the authors highlight the HLA-B*46 group of alleles within the figure as it is not obvious to the non-expert as to what alleles are being referred to.

Figure 6; a more extensive figure legend and explanation of the data within the text is recommended.

We’d like to thank Reviewer for highlighting this issue. We’ve used the python package seaborn for constructing the clustermaps. The allele names on this clustermap figure (Fig 9 now) do not contain all alleles we have studied for visualization reasons. This is automatically taken care of by the seaborn package, and without really modifying the source code of this package, we are unable to highlight the row of a specific allele or allele groups.

Those HLA-B*46 group alleles that are clustered along with HLA-C alleles can be seen in Supplementary Table 1.

There were a few minor typographical and grammatical errors which are highlighted below;

Tapasin - small t

Line 154- 'tree' not 'three'

Line 170- sentence needs changing

We’d like to thank Reviewer for capturing these mistakes. They have been corrected, and necessary changes were made to the sentence on Line 170 (which Is on line 200 in the revised manuscript).

Line 210-a brief definition of what the authors mean by 'energically active'

We have chosen to remove this statement to avoid confusion, as even a brief explanation would involve the description of the computational method in question, and would divert the attention of the reader.

Line 211-212- sentence needs changing

Line 303- remove an ‘also’

Necessary changes were made.

RESPONSES TO REVIEWER #3

In this manuscript, authors Sercinoglu and Ozbek are reporting on a comparative analysis of various HLA Class I alleles, based on amino acid sequence variation, calculated local energetic frustration and evolutionary importance per each position within alpha 1 and alpha 2 domains. They claim that lesser variation at certain positions correlates with low local frustration. Such residues are likely to contribute to the formation of a conserved “MHC Class I fold” structure and consequently carry high evolutionary importance. Many other residues present intermediate to high local frustration and suggested by the authors to be involved in binding to peptide ligands, intracellular chaperones as well as receptor molecules from immune cells that survey antigen presentation by MHC Class I.

I think that this study is a great attempt to compare a vast number of HLA I sequences and to extract valuable information that offers a great potential for generating testable hypotheses that will allow investigation of MHC Class I molecules by biochemical, immunological or cell biological methods. Nevertheless, I have some major concerns, which I think should be addressed by the authors before the manuscript can be published.

1. Authors acknowledge that HLA Class I molecules are the most polymorphic set of gene products and the polymorphisms are concentrated in alpha 1 and alpha 2 domains. This is reflected in the analysis of sequence variations within those domains (Figure 2). Authors mention that the positions with high sequence variation also yield high rvET scores and vice versa. The findings propose that the two analyses of sequence variation and evolutionary trace correlate. It is not clear which residues qualify as evolutionarily conserved/important. Apart from the two cysteines that are essential for folding MHC Class I molecules, which other residues are important for forming the conserved “MHC fold”? Authors should explain clearly how their data supports their reasoning of MHC Class I structure conservation through evolution and which positions are required for folding and which others are required for MHC I function/interactions with other proteins.

We’d like to thank the Reviewer for this question.

We have indeed used rvET as a measure of sequence conservation (with lower rvET values indicating higher conservation) as well as evolutionary importance. We’d like to refer to our response to your third comment below for an explanation of how the rvET identifies evolutionarily important positions.

On the other hand, a residue at a specific position may be conserved for different reasons, e.g. for maintaining stability, catalysis, protein/ligand interactions, etc. We used the local frustration analysis to quantify how energetically favorable interactions of the respective residue are in the context of the given structure. Here, minimal frustration (i.e. high SRFI) usually indicates a residue that is important for protein stability. Finally, we identified the residues/positions that are essential for folding MHC based on their SRFI values and rvET scores. Accordingly, minimally-frustrated and conserved residues should be functionally important for formation of the conserved MHC formation and stability. In contrast, highly frustrated residues (i.e. low SRFI) may be involved in protein-protein interactions.

We have now revised the text under “Integrating biophysics into HLA I evolution” heading of Results section to include further explanatory statements regarding how SRFI profiles and rvET (sequence conservation) values were interpreted to reflect the functional importance of HLA residues. For clarity, we also included an additional Figure (Fig 8) to better reflect how minimally-frustrated and conserved residues are physically connected within the HLA structure.

2. Based on the assumptions of the authors, lower level of variance and higher evolutionary conservation is expected to be seen among alpha 3 domains of HLA I. Alpha 3 domain sequence comparisons should be included to be able to demonstrate the proposed sequence conservation during evolution for residues that are involved in the core architecture of MHC Class I.

We’d like to thank the Reviewer for highlighting this issue.

We largely focused on the HLA peptide-binding groove, as this is the region where the highest level of polymorphism level is observed. Moreover, the rvET values are highly dependent on the input sequence alignments, and including as many different allele sequences in this analysis as possible is important to obtain more reliable scores. Since there is a large gap between the number of alleles with and without alpha-3 domain sequences (8696 versus 1436), we had included only binding-groove sequences in our initial analysis.

Based on your suggestion, we performed the same sequence variation and real-value Evolutionary Trace analysis for the alpha 3 domain as well. The results are given in Figure 3, and necessary additions were made in the manuscript text. Note that the sequence variation analysis involved only the 1436 HLA alleles for which homology models could be generated (as opposed to the 8696 HLA allele sequences for which the binding groove sequence is available), since only this many alleles featured the sequence of their alpha-3 domains.

3. For the first round of analyses performed and shown in Figure 2, I suggest that the authors compare HLA-A, HLA-B and HLA-C allele sequences within three separate groups (they only apply this separation to local frustration analysis), as those allotypes are products of three different gene loci and sequence evolution might have occurred in different positions in time. This might increase the resolution of the sequence variation analysis greatly.

We’d like to thank the Reviewer for pointing out this aspect of the sequence variation analysis.

Your suggestion is quite valid: the evolution of HLA allele sequences should indeed be considered in the sequence variation analysis. The sequence variation analysis in our study may appear not to take this into account, yet the metric we have chosen to identify conserved and "evolutionarily important" residues/positions within the HLA sequence, the "real-value Evolutionary Trace" (rvET) is actually based on both variation level of each position in sequence and where exactly the variation occurs along a phylogenetic tree that is constructed from input sequences. As such, the rvET should be particulary suitable for studying sequence variation in HLA genes.

Here is an example from the reference paper of the evolutionary trace method (Fig 1 from Wilkins et al., 2012). Suppose we have the following sequence alignment, along with the associated phylogenetic tree:

Figure 1 from Wilkins et al. (2012), Figure taken from PubMed Central (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892863/)

If we were to quantify sequence variation/conservation only by using a more “conventional” metric such as Shannon’s entropy, and attribute equal weight to each amino acid type, positions 3 and 4 (which predominantly feature Q and V amino acids) should be considered equivalent in terms of how conserved they are. The Evolutionary Trace method however will prioritize position 3 over position 4 in this case, as “the basic hypothesis behind the method is that “residues that vary among widely divergent branches of evolution are more likely to have a larger functional impact than other residues that vary even among closely related species”. Even though here the main aim is to enable a better comparison of sequences from different species, the same logic could be applied to comparisons of HLA alleles sequences that are included in different gene loci.

The main idea/purpose of our study is based on a comparison of the conserved MHC fold and HLA sequence variation. Our knowledge of the MHC protein complex structures suggests that this fold is conserved across different allele groups, hence there is a single well-defined shape of the complex. We also aimed to identify positions that are functionally important (i.e. to maintain the shape/stability as well as protein interactions) within this fold. Therefore, we used a single metric to quantify how important each position is for this conserved fold. In that sense, we believe that the choice of using rvET based on an analysis of HLA binding groove sequences of all HLA-A, HLA-B and HLA-C allele groups, is justified.

On the other hand, one may argue that the amino acid differences between allele groups that contribute to the clustering pattern based on SRFI profiles remained unclear in the manuscript text. We have supplemented the Figure showing the SRFI-based clustering results (Figure 10) with sequences logos that highlight differences between amino acids included in clustering positions of different SRFI clusters (14 positions shown in the SRFI clustermap in Figure 10A). For this purpose, we used the Two Sample Logos server (twosamplelogo.org, Vacic et al. (2006)).

Wilkins, A., Erdin, S., Lua, R., & Lichtarge, O. (2012). Evolutionary trace for prediction and redesign of protein functional sites. Methods in molecular biology (Clifton, N.J.), 819, 29–42. https://doi.org/10.1007/978-1-61779-465-0_3

Vacic V, Iakoucheva LM, Radivojac P. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments. Bioinformatics. 2006;22: 1536–1537. doi:10.1093/bioinformatics/btl151

4. Authors mention high levels of local frustration for residues that might be contacting TCR or tapasin. While it is conceivable that parts of MHC Class I that interact with TCR are still mobile and energetically active, it is not clear why residues that are on the tapasin interface are highly frustrated. Authors model peptide-bound MHC Class I structures where, residues contacting peptides show lesser or neutral frustration profile. After acquiring a strongly bound ligand, MHC Class I molecules are not expected to interact with tapasin so it remains to be clarified why peptide-loaded MHC Class I molecules remain frustrated in alpha2-1 region.

We’d like to thank the Reviewer for bringing our attention to this issue regarding peptide-binding related changes in local frustration.

Our current knowledge indeed tells us that Tapasin functions as a chaperone to keep the peptide-free MHC in a peptide-receptive state, and that the Tapasin-MHC interaction is broken once a high-affinity peptide is loaded by the peptide-binding groove. This has also been demonstrated to involve a molecular “tug-of-war” mechanism between the peptide C-terminus and Tapasin through their interactions with the alpha-2-1 region (Fisette et al., 2016).

How do we link high frustration in the alpha2-1 region of peptide-loaded MHC with Tapasin interaction, when logic tells us the frustration should be reduced upon peptide-binding? Here, we’d like to highlight what exactly the local frustration metric we used, the SRFI, tells us, and further clarify why it is relevant for sites involved in binding to Tapasin. We used the frustratometer2 tool in our study. Frustratometer2 computers the SRFI metric via “randomizing the identity of every single amino acid within the decoy structure set, and thus evaluates every possible mutation at every site in a well-defined native structure” (R.Parra et al. (2016)). In other words, the SRFI is a relative measure, and answers only the following question: “Among all other amino acids at a specific location within a protein structure, how favorable/unfavorable the present amino acid is terms of its contacts with surrounding residues?”.

In the context of Tapasin-MHC interaction, we may re-phrase this question as “Among all other amino acids that may be present in the alpha2-1 region, how favorable are the given amino acids in our homology models in terms of contacting Tapasin molecule?”. Considering the related literature information on patches of residues with increased frustration on protein surfaces involved in binding (Freiberger et al., 2019; Ferreiro et al., 2014), we may argue that our observation of increased frustration at Tapasin contacts sites in our homology models indicates that human MHC, the HLA, evolves to maintain contacts with Tapasin.

Hence, the frustration analysis we present only indicates that the amino acids in the alpha2-1 region and other sites contacting Tapasin are “optimal” for protein interaction. However, even though it is an important prerequisite, the residue identity is not enough to explain protein-protein interaction, as these interactions are usually transient (as in the case of TCR-pMHC interactions). An investigation of whether HLA and Tapasin will remain in contact upon peptide-binding should consider conformational changes and protein dynamics into account. Furthermore, other appropriate measures that are not relative measures such as SRFI should be used for this purpose. For example, configurational frustration may be used to describe local frustration after simulating peptide-free and -bound MHC structures and obtaining relevant conformations in such a study.

Parra RG, Schafer NP, Radusky LG, Tsai MY, Guzovsky AB, Wolynes PG, et al. Protein Frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics. Nucleic Acids Res. 2016;44: W356–W360. doi:10.1093/nar/gkw304

Fisette O, Wingbermühle S, Tampé R, Schäfer L V. Molecular mechanism of peptide editing in the tapasin-MHC I complex. Sci Rep. 2016;6: 19085. doi:10.1038/srep19085

Freiberger MI, Brenda Guzovsky A, Wolynes PG, Gonzalo Parra R, Ferreiro DU. Local frustration around enzyme active sites. Proc Natl Acad Sci U S A. 2019;116: 4037–4043. doi:10.1073/pnas.1819859116

Ferreiro DU, Komives EA, Wolynes PG. Frustration in biomolecules. Quarterly Reviews of Biophysics. Cambridge University Press; 2014. pp. 285–363. doi:10.1017/S0033583514000092

5. One way of calculating the contribution of peptide binding to reducing local frustration would be comparing the local frustration of the residues in alpha 1 and alpha 2 domains in the absence versus the presence of peptide in the models. Authors perform this analysis and show some results in Figure 6 but it is not possible to understand the changes in SRFI values are statistically significant. Also, the choice of residues is not clearly explained, why are residues such as 114 and 116 not included in the analysis?

We thank the Reviewer for highlighting this issue regarding the statistical significance of SRFI change upon peptide-binding.

The residues shown in this Figure (in the current version, Fig 10) are the top 10 residues that show the highest increase in SRFI upon peptide binding (please see the following to see how exactly the SRFI change was computed). In fact, all residues, including those at positions 114 and 116 were included in the analysis, they were just not among those in top 10 (position 116 ranked 220th, and 114 205th, respectively).

A direct approach to compute SRFI changes upon peptide-binding here may involve simply computing differences between SRFI values of each position in peptide-bound and -free structures, ignoring allelic polymorphism. This approach, however, may be misleading, as here it is assumed the frustration profiles within the structures of each group, peptide-free and -bound states, are equivalent to each other, and hence a statistical significance test can be applied. In fact, they are not, since SRFI is a relative measure (as we describe above, and highly dependent on the allele sequence).

Therefore, we took an alternative approach and computed SRFI differences by subtracting the median SRFI in peptide-bound HLA models (up to 10 structures) and a single SRFI value in a single peptide-free HLA structure for each allele. The distributions in Figure 10 thus indicate distribution of SRFI change values (a single value from each allele for each position).

Nevertheless, it could still be possible to assess statistical significance of SRFI change for each allele. However, this would involve multiple comparisons of ten SRFI values against a single one. Considering this limitation, and the simple fact removing the peptide from pMHC models does not lead to representative structure of the peptide-free state of MHC, we choose to refrain from performing a statistical analysis. Our findings in this section are therefore highly exploratory and should be validated or revised in more extensive and proper studies.

We have included explanatory statements to clarify these aspects of our analysis in the manuscript.

Minor points:

1. The abbreviation of beta 2-microglobulin should not include any dashes (β2m).

Necessary changes were made in the manuscript text.

2. In line 75, the term “HLA chain of MHC” should be revised. It is not clear what is meant by this phrase.

We’d like to apologize for any confusion this might have caused. We have made the necessary change.

3. In line 84, term “protein alleles” should be revised as allelic forms of genes but proteins exist.

We’d like to apologize for any confusion this might have caused. We have removed “protein” from this term to avoid confusion.

4. Although it is conceivable that the abbreviations “HLA” or “HLA I” or “MHC” are used by the authors to indicate HLA Class I, I advise the consistent use of one single abbreviation for the proteins.

We’d like to apologize for any confusion this might have caused. We have made relevant changes for consistent use of abbreviations throughout the text.

a. Clustering data (fig5) already indicates that alleles from different loci have distinct properties and frustration profiles.

We’d like to thank the Reviewer for this feedback. Do you mean that it may not be necessary to map SRFI values on binding groove structures?

5. It is not clear in Figure 4, which position the SRFI value is derived from. Why is there a decreasing trend?

We’d like to thank the Reviewer for this question. In this Figure, (Fig 5 in the update manuscript), we delibaretly ordered positions according to decreasing SRFI values. The main idea here was to find out which positions were minimally, highly, and neutrally frustrated, and this is visually much more easily identified when the positions are ordered as such. Fig 5A highlights minimally and highly frustrated positions, whereas Fig 5B highlights the neutral frustration residues.

6. Figure 5A – labels on top and left of the figure are missing.

We’d like to thank the Reviewer for this comment. This figure (Fig 9A in the revised manuscript) has been revised to include labels (Clusters).

RESPONSES TO REVIEWER #4

This study by Sercinoglu and Ozbek takes an interesting and unique theoretical approach based on the analysis of single residue frustration analysis (SRFI), to address the structure-function question relating to the highly diverse and evolutionarily non-conserved MHC-I protein. The paper is thought-provoking and reasonably well presented, but the following points need to be addressed before it is ready for publication:

1. In order to improve the access of this article to a more biological readership, the authors need to clarify what local frustration analysis is, what it measures and what its significance is with respect to MHC diversification during evolution. This clarification should begin within the body of the abstract (expand on ll22), and continue in the introduction and discussion.

We appreciate this valuable feedback.

We have provided a more extensive explanation of local frustration, and why it is particularly useful for studying HLA polymorphism in the introduction section. Our changes are highlighted in red color. Kindly refer to the changes in the revised manuscript for more detail.

2. A significant yet unavoidable limitation of the study is the large number of homology-modelled structures. More attention should be paid to this limitation in the text.

We’d like to thank the Reviewer for this comment. The section “Homology modeling of HLA alleles in the context of pMHC” have been included exactly to highlight why homology modeling was necessary in the study. We have not inserted additional two sentences describing why homology modeling was used to model peptides as well.

3. Around l176: Is it likely that higher frustration in the binding pockets (especially but not exclusively the F pocket) leads to more dependence on cofactors for assembly? That is to say, is there a correlation between frustration index and tapasin or TAPBPR dependence among the different alleles? This issue should be discussed.

We’d like to thank the Reviewer for bringing our attention to dependence of HLA alleles on cofactors for peptide loading.

We used a recent study by Ilca et al. (Ilca et al. (2019)) as reference to investigate this issue further with respect to our frustration data. In this study, Ilca et al. compared a number of HLA alleles in terms of their TAPBPR binding levels, and found out that the TAPBPR prefers binding to HLA-A allotypes particularly in the A2 and A24 supertypes rather than HLA-B and HLA-C allotypes.

We have inspected local frustration profiles of 30 alleles for which TAPBPR binding levels were reported by these authors based on either peptide-binding pocket residues or TAPBPR binding interface residues. We have summarized our findings under a new heading “Local frustration in the context of TAPBPR-HLA interaction”. Kindly refer to the respective section of the update manuscript for further details.

4. Figure 6: Similarly, when the authors undertookSRFI on “empty” MHC alleles, they found frustration to increase. Does the degree of change in SRFI correlate with tapasin or TAPBPR dependence? This issue should be discussed.

We’d like to thank the Reviewer for this question.

A significant limitation here is the simple fact that peptide-free models in our study are certainly not representative of the actual structural ensembles, as it is known that in its peptide-free state, the complex does not have a well-defined three dimensional structure. Our peptide-free models were generated by simply “deleting” the peptide from the binding groove. A comparison of peptide-bound and -free models in terms of SRFI profiles here may give clues regarding which residues are immediately affected from peptide removal. However, any further investigation based on peptide-free MHC regarding the behavior of the whole complex, such as the dependency on chaperone for assembly, should necessarily involve conformation changes related to peptide removal to be modelled. Therefore, while it is perfectly reasonable and even necessary from a biological perspective, we refrained from performing this analysis regarding the SRFI change between peptide-free and -bound homology models for the above reasons.

Minor points

5. Ll30: MHC refers to the genetic locus, not the “protein complex”

We’d like to apologize for confusion here. We have revised the relevant section of the abstract as:

“The protein complex encoded by Major Histocompatibility Complex (MHC) genes in our body cells plays a critical role in our fight against pathogens via presentation of antigenic peptides to receptor molecules of our immune system cells”

6. Ll31 “of genetics” not “on genetics”

7. Ll170 necessary [to] model

We’d like to thank the Reviewer for highlighting these mistakes. These have been corrected in the revised manuscript.

Attachment

Submitted filename: Response_to_Reviewers.docx

Decision Letter 1

Antony Nicodemus Antoniou

23 Apr 2020

Sequence-structure-function relationships in class I MHC: a local frustration perspective

PONE-D-19-35019R1

Dear Dr. Ozbek,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Antony Nicodemus Antoniou, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: I Don't Know

Reviewer #3: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: The authors have addressed all issues that I have raised. There are some points, mainly typographical issues that need addressing and only minor changes to figure legends.

L59; protein surfaces

L81; the cell surface

L136; What do the TEM abbreviations mean

L303; Do the authors mean Fig 5?

L314; The analysis of the a3 domain has highlighted that p203 and 259 are conserved reflecting that these positions form a conserved disulphide bond. The authors could perhaps briefly comment as they did previously for C101-C164 pairing.

Figure Legends

Fig 4; the different views of the molecular structure should be mentioned within the figure legend.

Fig 8; the legend just needs to be a bit more specific i.e. 'highlighting interactions with HLA I heavy chain"

There is a formatting issue with supplementary table 1.

Reviewer #3: I believe that the authors understood and have made considerable effort to adress the questions that I have raised during the review process. I find that the additional figures, sections and more detailed explanation of the method have contributed to the clarity and understability of the manuscript and the novel approach in the field of MHC class I bioinformatics.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

Reviewer #3: Yes: Zeynep Hein

Acceptance letter

Antony Nicodemus Antoniou

5 May 2020

PONE-D-19-35019R1

Sequence-structure-function relationships in class I MHC: a local frustration perspective

Dear Dr. Ozbek:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Antony Nicodemus Antoniou

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. HLA I alleles for which homology models were generated.

    Clusters are also indicated with numbers (1, 2, or 3). HLA core alleles (taken from Robinson et al. [54]) are shown in bold.

    (DOCX)

    S2 Table. Median SRFI and rvET values of selected positions in HLA I peptide binding groove and α-3 domain as shown in Fig 5 and Fig 7.

    The rows are ordered according to increasing rvET.

    (DOCX)

    Attachment

    Submitted filename: Response_to_Reviewers.docx

    Data Availability Statement

    All allele sequences and local frustration analysis results will be available from Dryad Digital Repository after acceptance with DOI: https://doi.org/10.5061/dryad.gmsbcc2hx.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES