The ClusPro AbEMap web server for the prediction of antibody epitopes

Israel T Desta; Sergei Kotelnikov; George Jones; Usman Ghani; Mikhail Abyzov; Yaroslav Kholodov; Daron M Standley; Dmitri Beglov; Sandor Vajda; Dima Kozakov

doi:10.1038/s41596-023-00826-7

. Author manuscript; available in PMC: 2024 Feb 27.

Published in final edited form as: Nat Protoc. 2023 May 15;18(6):1814–1840. doi: 10.1038/s41596-023-00826-7

The ClusPro AbEMap web server for the prediction of antibody epitopes

Israel T Desta ^a,⁺, Sergei Kotelnikov ^b,⁺, George Jones ^b, Usman Ghani ^a, Mikhail Abyzov ^c, Yaroslav Kholodov ^c, Daron M Standley ^d,^e, Dmitri Beglov ^a, Sandor Vajda ^a, Dima Kozakov ^b

PMCID: PMC10898366 NIHMSID: NIHMS1965875 PMID: 37188806

Abstract

Antibodies play an important role in the immune system by binding to molecules called antigens at their respective epitopes. These interfaces or epitopes are relational and structural entities, making them ideal problems to address with docking programs. Since the advent of high-throughput antibody sequencing, epitope mapping with as little prior information as the sequence of the antibody is becoming a necessity. ClusPro, a leading protein-protein docking server, together with its template-based modeling version, ClusPro-TBM, have been re-purposed to map epitopes for specific antibody-antigen interactions using the Antibody Epitope Mapping server (AbEMap). ClusPro-AbEMap offers three different modes for users depending on the information available on the antibody as follows: (1) X-ray structure, (2) computational/predicted model of the structure, or (3) only the amino acid sequence. The AbEMap server presents a likelihood score for each antigen residue of being part of the epitope. We provide detailed information on the server’s capabilities for the three options, and discuss how to obtain the best results. In light of the recent introduction of AlphaFold2 (AF2), we also show how one of the modes allows users to use their AF2 generated antibody models as input. The protocol describes the relative advantages of the server compared to other epitope mapping tools, its limitations, and potential areas of improvement. No special skills or experience is required for users other than preparing PDB input files for the target antigen and the sequence in FASTA format or PDB input for the antibody if present. The server may take 45 – 90 minutes depending on the size of the proteins.

EDITORIAL SUMMARY:

AbEMap generates large ensembles of docked antigen-antibody structures based on the structure of an antigen and either the structure or the sequence of an antibody. For each antigen residue, a likelihood score of being part of the epitope is obtained.

TWEET:

#EpitopeMappingWithPIPER the AbEMap web server predicts the likelihood of antigen residues interacting with a specific antibody based on the structure or the sequence of the antibody.

Teaser:

AbEMap predicts antibody-specific epitopes

INTRODUCTION

Antibodies form one of the key arms of the adaptive immune system in vertebrates. They target solvent-exposed proteins called antigens on the surfaces of pathogens. After recognition and contact, the antibodies mediate the humoral immune response to the attached pathogen¹. The diversity and specificity of antibodies are the reason why harnessing their unique features is paramount in the pharmaceutical industry. Understanding and accurately predicting atomic-level details of the antibody-antigen interface are crucial for utilizing antibodies². Finding the antigen residues in the interface, henceforth called epitope mapping, can be useful for the design of monoclonal antibodies³, for developing vaccines⁴, and for investigating immune responses⁵.

The development of methods for predicting antibody-antigen interactions and for antibody-based drug discovery was traditionally handicapped by the difficulty of obtaining high numbers of antibody sequences. However, due to advances in high-throughput single-cell and Variable-Diversity-Joining(VDJ) sequencing of B-cell receptor repertoire^6,7, the availability of antibody sequences is no longer an issue in the race towards developing antibody-based drugs. Fast and accurate prediction of the epitopes for these antibody targets has become the new bottleneck⁸. Currently, epitope mapping efforts are dominated by experimental techniques such as X-ray crystallography, mutagenesis (for instance, alanine scanning), and phage display. X-ray crystallography is laborious and expensive, whereas mutagenesis and phage display generally do not provide atomic-level details⁹. Importantly, none of these experimental methods can be used in a high throughput manner. In view of these limitations, substantial efforts have been devoted to the development of computational epitope mapping methods^10-15. However, epitope prediction for a given antigen and a given antibody is a difficult computational problem that requires further development to improve the accuracy and reliability of the predictions¹⁶. Part of the difficulty is due to the paucity of nonredundant structural data on antibody-antigen interactions since, as reported by Jespersen et al. in 2017, less than 25% of the antibody-antigen complexes found in the Protein Data Bank (PDB) are unique when taking a 70% sequence identity threshold cutoff for the antigen¹⁷.

The challenge of epitope mapping can be partially addressed by finding residues on the antigen’s surface that are most likely to interact with a generic antibody (as opposed to a specific antibody)^10,12,17,18. Some examples of such an approach are implemented in the servers Spatial Epitope Prediction for Protein Antigens (SEPPA)^10,12 and BEpro (formerly known as PEPITO)¹⁸. SEPPA uses a logistic regression algorithm with features such as antigen residue surface accessibility and propensity of unit-triangle patches (3 residue-groups on the antigen’s surface) among other factors to score the surface residues^10-12. BEpro adds amino-acid propensity scale and side-chain orientations besides other features¹⁸. Despite the achievements of the antibody-agnostic approach, it is crucial to highlight that epitopes are, by definition, relational entities and that epitope mapping ought to be antibody-specific. This is evidenced by several antigens with particular affinities to different antibodies at different interfaces. A well-studied example is hen egg lysozyme (HEL) which is crystallized with four different antibodies in the PDB structures 1BVK, 1DQJ, 2I25, and 1MLC with little overlap^19-22. Therefore, consideration of both the antibody and the antigen in epitope mapping is not only appropriate but generally also increases the accuracy since more information can be gleaned from the antibody side⁴.

This relational nature of epitopes makes it especially attractive to approach the epitope mapping problem using docking which is a computational method that conventionally predicts the binding mode of two biological units²³. One fairly successful example of such methods is ClusPro^24-26. ClusPro is a webserver that directly docks two interacting proteins when given their X-ray structures. The server is freely available to those in non-profit organizations and is used by over 20,000 scientists worldwide. It runs on a rigid-body docking program called PIPER which uses a fast-Fourier transform (FFT) correlation approach²⁷. The interaction energy, which is composed of van der Waals (vdW) energy terms (repulsive and attractive), electrostatic energy (Coulombic and Born approximations), and a structure-based pairwise statistical potential, is used for ranking the docked models. In 2012, a special antibody-antigen version of the pairwise statistical potential was introduced, vastly improving antibody-antigen docking accuracy²⁸. In a recent comparative study, ClusPro was reported as the best server for antibody-antigen docking²⁹. Hua et al. utilized the top 30 models predicted by ClusPro and combined it with site-directed mutagenesis to localize epitopes on several case studies successfully⁹. However, they also stated that docking alone or machine learning-based methods did not provide unique epitope positions and had to be followed by experiments.⁹ Krawczyk and colleagues used a docking method in their epitope mapping server EpiPred³⁰. They used ZDOCK, an FFT-based rigid-body docking method to generate models to score putative epitope patches determined by geometric fitting^30,31. More recently, Sikora and colleagues performed rigid-body Monte Carlo docking of monoclonal antibody against glycosulatedSARS-CoV2 spike protein configurations to check for accessibility of potential epitope candidates³².

One challenge that both Krawczyk et al. and Hua et al. faced, using the EpiPred and ClusPro servers respectively, was the servers’ inability to work with sequences^9,30. Working with sequences requires a separate modeling step (which is not offered by EpiPred and ClusPro) if the antibody’s X-ray structure is not available. However, as mentioned above, antibody sequencing has made major advances over the past few years, whereas the technology of determining the X-ray structure of antibodies has not substantially improved^33,34. This limitation implies that epitope mapping tools should ideally include the ability to model the antibodies from their sequences in addition to mapping the epitopes on the given antigen structure. As a response to this need, we present in this work an end-to-end epitope mapping server based on ClusPro’s docking protocol, ClusPro Antibody-based Epitope Mapping (ClusPro-AbEMap), that offers template based modeling of the antibody if an X-ray structure is not available. The ClusPro-AbEMap server https://abemap.cluspro.org/ integrates our template-based modeling method³⁵ and antigen-antibody contact prediction via docking^36,37 to identify the epitope of a given antigen structure from an antibody sequence or X-ray structure.

If the X-ray structure of the antibody is known, thousands of low-energy antibody-antigen models predicted by PIPER — the rigid-body docking program ClusPro is based on — are used to score the antigen surface residues. If the X-ray structure of the antibody is unknown, AbEMap builds multiple homology models of the antibody that are used for docking instead. The consensus of the docked complexes based on all the antibody models and antigen structure templates is used to score the antigen residues. For the homology modeled antibodies special care should be taken not to penalize possible clashes, by reducing the weight of the vdW component of the interaction energy. Although epitope prediction remains a difficult computational problem and clearly more method development is required, we present the protocol for AbEMap which performs better than the popular peer servers SEPPA, BEpro, and EpiPred³⁸.

Finally, we explore the potential use of deep neural network-based method AlphaFold2 for antibody structure prediction as well as epitope mapping. It has been demonstrated in the CASP14 experiment and now is well established that AlphaFold2 substantially improved the accuracy of predicting the structure of most monomeric proteins^39-41. We show that AlphaFold2 modeled antibodies perform nearly as well as our ensemble of template-based models. However, according to our results using the linker-based approach to predicting protein-protein interactions, recently proposed by several groups^42-45, does not improve the accuracy of AbEMap in finding antibody epitopes.

The AbEMap algorithm and server overview

The server has two modes of running: the first requires an antibody structure as input, which can be an X-ray structure or a precomputed homology model, and the second can perform epitope prediction starting from the amino acid sequence of the antibody, assuming that appropriate antibody homologous structures are available in the PDB. The antigen structure is assumed to be known in both modes. The protocol performing the second mode includes both homology modeling and docking (Fig. 1A-1F). It builds multiple (if applicable) antibody models that form an ensemble. Once structures are available for both antibody and antigen, their mutual conformational space is sampled using PIPER²⁶, the docking engine of the ClusPro server, and the 1000 lowest energy complex poses are identified. When ClusPro is used for protein-protein docking, the 1000 structures are clustered and the centers of the most populated clusters are selected as models of the complex. However, for epitope prediction the 1000 structures are instead used to calculate the frequency of each antigen surface atom’s occurrence in the antibody-antigen interface. As will be shown, in order to map an epitope AbEMap defines the atomic epitope likelihood score as the Boltzmann weighted atomic interface occurrence frequency averaged over the ensemble of antibody structures.

Figure 1 ∣ — Outline of the AbEMap protocol using an antigen structure and an antibody sequence as inputs, examples of complex structures generated by PIPER, four examples of results, and comparisons to other servers. A) User inputs the solved crystal structure of the antigen (shown as PyMOL stick figures in purple) and the antibody sequence (shown as purple text) (if structure is unavailable). **(B)** The antibody sequence is used to find close homologues using BLAST for each of its heavy and light chains. A sample multiple sequence alignment of close homologues is shown for the monoclonal murine antibody 1FGN. L1 & H1 (green), L2 & H2 (blue), and L3 & H3 (red) regions of the complementarity determining regions (CDRs) are highlighted. The list of homologues is filtered using sequence identity and sequence similarity of L3 and H3 regions to the query sequence. **(C)** The structures for the selected sequences are modelled individually using MODELLER. Aligned regions of the backbone are copied from the template, whereas non-aligned regions are modelled. **(D)** The residues with the highest likelihood of being in the epitope are highlighted in red in the results page of the server. **(E)** Billions of antibody-antigen complex conformations are generated by PIPER for the given antibody structure or for each antibody model. The antibody is shown as translucent cartoon and the antigen is shown as cyan surface. **(F)** The bar plot shows the number of poses in the top 100 models generated by PIPER that are within different RMSD thresholds. For example, 3 models (in the top 100) have RMSD ≤ 2 Å, and 21 models have RMSD between 10 and 12 Å. **(G)** As examples of visualizing the results, modelled murine anti-tissue factor (PDB ID 1FGN) and tissue factor (PDB ID 1TFH) are shown as surfaces with residues colored from blue to red based on increasing predicted epitope likelihood score. 19 of the 26 epitope residues are in the 30 top ranked residues. **(H)** Modelled humanized Fab D3h44 (PDB ID 1JPT) and tissue factor (PDB ID 1TFH) shown as surfaces with residues colored from purple to gold based on increasing predicted epitope likelihood score. 20 of the 24 epitope residues are in the 30 top ranked residues. **(I)** Modelled anti-CCL2 neutralizing antibody (PDB ID 4DN3) and monocyte chemoattractant protein (1DOL) are shown as surfaces with residues colored from orange to green based on a decreasing predicted epitope likelihood score. All 14 of the 14 epitope residues are in the top 30 ranked residues. **(J)** Modelled anti-shh chimera Fab fragment (PDB ID 3MXV) and sonic hedgehog N-terminal domain (PDB ID 3M1N) shown in surface with residues colored from yellow to red based on increasing predicted epitope likelihood score. 15 of the 24 epitope residues are in the 30 top ranked residues. **(K)** The distribution of the ROC AUC scores of 28 unbound antibody-antigen complexes for two of the top epitope-predicting servers (SEPPA and BEpro) are compared to that of AbEMap. AbEMap outperforms both in terms of average (red dot), median (middle line), as well as the 25^th and 75^th quartiles. **(L)** The $F 1$ and $MCC$ scores of three different methods are compared for homology modelled antibodies when only homologues with <80% sequence identity are used as templates. ClusPro AbEMap takes the ensemble average residue scores from 5 to 10 of the best homologues, and EpiPred is used for epitope prediction with the best homology modelled antibody. AbEMap outperforms EpiPred before and after ensemble averaging the likelihood scores.

When given only amino acid sequences of the heavy and light chains of the antibody in the more general second mode, ClusPro-AbEMap starts with a BLAST search for homologous structures in the PDB (Fig. 1B). It restricts sequence identity to be above 20% and e-value below 1e-40 but in the case where no templates are found, the e-value threshold is increased to 1e-20. Once the search is complete, only templates with both heavy and light chains that meet the sequence constraints are retained. The resulting templates are then ranked based on both sequence identity and sequence similarity of CDR3s in the heavy and light chains. For complementary-determining region (CDR) detection we use the same tools as in the ClusPro server⁴⁶. We take the five highest-ranked structures based on CDR3 sequence identity and the five highest-ranked structures based on the CDR3 sequence similarity, and use the union of the two sets as antibody templates. If the CDRs of the antibody could not be identified or if there is no CDR (as in single domain antigen receptors), then the five top candidates ranked by the global sequence identity are selected. The second step is constructing homology models of the antibody based on the selected templates (Fig. 1C). MODELLER tools⁴⁷ are used to realign the antibody sequences taking into account the template structural information. The program models the backbone atoms of the non-aligned residues and all sidechains, while the backbone atoms of the aligned residues are kept fixed at the template coordinates. The single best model proposed by MODELLER for each template is retained for the next step in the epitope mapping process, thus generally AbeMap retains multiple antibody models.

When given an antibody X-ray structure or antibody models that have been already constructed, the next step of AbEMap is global antibody-antigen docking by the PIPER program²⁷ that directly docks two protein structures (Fig. 1E). PIPER uses the fast-Fourier transform (FFT) correlation approach⁴⁸, which represents the interaction energy of the complex as a weighted sum of correlations between the fixed receptor and rotationally and translationally mobile ligand grids. Together with the FFT method, this representation makes exhaustive conformational sampling of the 6-dimensional energy landscape computationally feasible. The standard level of discretization used in PIPER is 70000 rotations from the Sukharev quasi-uniform grid sequence⁴⁹ (approximately 5 degrees by Euler angular step) and a translational grid step size of 1Å. The energy function $E$ includes terms representing repulsive and attractive components of the vdW energy (denoted as $E_{r e p}$ and $E_{a t t r}$ , respectively), a Columbic term describing the electrostatic interaction energy ( $E_{C o u l}$ ), a generalized Born type polar solvation energy term ( $E_{B o r n}$ ), and another solvation term based on the structure-based statistical potential $E_{D A R S}$ based on the Decoys As the Reference State (DARS) approach⁵⁰. A special antibody-antigen asymmetric version of the DARS potential has been developed, significantly improving ClusPro’s antibody-antigen docking accuracy²⁸. This antibody-antigen specific potential takes advantage of the fact that aromatic residues dominate the paratope but not necessarily the epitope, whereas the epitope generally has higher level of hydrophobicity than the paratope.

To sample antibody-antigen interaction the known antigen structure is docked to either the known antibody X-ray structure or to the ensemble of antibody homology models obtained in the previous step from the antibody sequence data. Following our recently published protocol⁴⁶, we mask all antibody residues except for CDRs. The server shows results for the energy function currently used in ClusPro for antibody-antigen docking ( $E = 0.5$ $E_{r e p}$ − 0.2 $E_{a t t r}$ + 300 $E_{C o u l}$ + 30 $E_{B o r n}$ + 0.2 $E_{D A R S}$ ). If the input is a computationally predicted or homology modelled structure or just the antibody sequence, then results for two additional weight sets are provided for the user as default: the option “No vdW”, which means that the weights for both vdW contributions are zeroed out, and the option “Reduced attractive vdW”, which implies that the weight for the attractive vdW term $E_{a t t r}$ is halved. These additional weight sets avoid penalizing possible steric clashes. However, it should be noted that the commonly used maximum repulsive and minimum attractive vdW thresholds are still in place for all coefficients. Reducing the vdW potential’s weights notably increased epitope prediction accuracy using the AbEMap protocol when only the sequence of the antibody or homology-modelled structure is given (Supplementary Fig. 1). As in the ClusPro server, the best-scored pose per rotation is retained, resulting in a total of up to 70000 docked poses for further analysis.

Once the PIPER docking poses and energies are obtained for the antibody structure or for each antibody model $i$ in the ensemble of homology models (which can include one or several models, depending on the number of suitable templates), the top 1000 lowest energy poses are selected, and for each such pose $j$ the number $l_{i j}$ of antigen surface atoms⁵¹ that are in contact with the corresponding antibody is counted. More precisely, any heavy atom on the antigen surface found to be within the 5Å threshold from any of the antibody surface heavy atoms is considered to be in contact with the antibody. For each antigen atom on the interface, we calculate a Boltzmann-weighted normalized contact “occurrence” as follows:

v_{i j}^{a t o m} = \frac{e^{\frac{- (ε_{i j} - ε_{i o})}{T}}}{l_{i j}}

where $ε_{i j}$ and $ε_{i o}$ are the $j$ th and the best PIPER energy scores of the $i$ th antibody structure in the ensemble, and a value of 100 was used for $T$ (‘temperature’) to scale the relative energy scores. After summing the atomic contributions shown above over $j$ (the docked structures) and averaging over $i$ (the different antibody models), an epitope likelihood score that indicates how often the atom participates in the antibody-antigen interface of low-energy models predicted by PIPER is obtained. The total number of considered docked structures and the ‘temperature’ factor were optimally selected using the ROC AUC score obtained from the likelihood scores of each residue (the definition of ROC AUC score will be given in the ‘Performance measures’ section below). The top 1000 lowest energy poses and a $T$ value of 100 give the best results (Supplementary Fig. 2). Since PIPER generally increases the number of docked structures around the native interface, it is expected and observed that atoms predicted to be in the epitope more frequently by the docked structures are more likely to be in the true epitope. This likelihood score is shown as the B-factor value in the final PDB file given to the user (Fig. 1D) and it helps to visually highlight plausible epitope regions. For evaluating epitope prediction accuracy, we convert the atom likelihoods to residue likelihoods by summing up the atomic contributions for each residue. Although adding atomic likelihood values implies that bigger residues with more surface accessible atoms are scored better, the residue likelihood values are not corrected for size, and hence this bias may have to be accounted for by the user.

Protein datasets used for testing AbEMap

A set of 40 antibody-antigen complexes found in the widely accepted protein-protein docking benchmark version 5.0 (BM5)⁵² from the Weng lab was used to test our protocol (Fig. 1-3). To ensure non-redundancy, the authors selected an antibody-antigen complex only if the antigen was not in the same SCOP (Structural Classification of Proteins)⁵³ family and it did not share more than 80% of the interface residues with another⁵². The BM5 set contains 12 antibody-antigen complexes with the antibody crystallized only in complex with the respective antigen but not on its own (termed as unbound-bound targets). For the other 28 complexes the X-ray structure of the structure of the antibody has been determined both on its own and in complex with the antigen (termed as unbound-unbound targets). In both cases, the antigen was independently crystallized in addition to its form in a complex with the respective antibodies. We compared the performance of AbEMap to that of SEPPA, BEpro, and EpiPred using this set of antibody-antigen complexities. However, EpiPred did not work for one of the unbound targets (PDB ID 2I25), most likely because it contains a single-chain shark antigen receptor without any CDR rather than a traditional antibody. Therefore, the target 2I25 was excluded from figures that compare AbEMap with EpiPred. In addition, we wanted to test the protocol on the 23 antibody-antigen complexes that were recently added to the docking benchmark set (denoted BM5.5)²⁹. However, AbEMap was unable to provide antibody models for two of the complexes, and hence our discussion is restricted to the remaining 21 targets.

Figure 3 ∣ — Examples of AbEMap’s applications. **(A)** The complex of birch pollen Bet V1 (blue surface) with the bound monoclonal antibody (magenta), PDB ID 1FSK. The true epitope residues are highlighted in red, and three of the homology models of the antibody are shown in green. The CDR3 of the heavy chain on the native antibody is highlighted in cyan. **(B)** Integrin alpha-L 1 domain (blue surface) with true epitope residues (red) is shown with the different poses of the Efalizumab FAB fragment predicted by PIPER. The cluster centers of the top antibody clusters are shown as gray pseudo-atoms. The top cluster’s representative is shown as a cartoon (green). **(C)** VEGF protein (blue surface) with the true epitope residues (red) is shown with the different poses of the FAB fragment of a neutralizing antibody predicted by PIPER. Similar to (B), the top cluster centers are shown as gray pseudo-atoms and the top ranked cluster representative is shown as a cartoon. **(D)** ROC plots of AbEMap’s performance with X-ray and homology modeled structures of the antibodies as inputs for the 28 unbound antibody-antigen complexes in the BM5 set. As shown, the use of homology models provides essentially the same accuracy as using the separately solved X-ray structures of antibodies.

Performance measures

The prediction performance for each antibody-antigen sequence was evaluated based on the ground truth obtained from the bound structures of the complexes. The true epitopes are simply the residues of the antigen that are within 5 Å from the nearest antibody heavy atom in the native complex^54,55. It appears that the most widely used performance measure among epitope mapping servers^10-12,18 is the Area Under the Receiver Operating Characteristic curve (ROC AUC), and hence it was also used to compare the performance of ClusPro-AbEMap with that of other probabilistic servers such as SEPPA and BePro (we recall that a ROC curve plots the true positive rate ( $TP$ ) versus the false positive rate ( $FP$ ), and ROC AUC is the area under the ROC curve). We also show the $F 1$ score — the harmonic mean of precision and recall — and the Matthews correlation coefficient ( $MCC$ ) score at different residue rank cut-offs as used in other studies^17,30. The true positives ( $TP$ ), false positives ( $FP$ ), true negatives ( $TN$ ), and false negatives ( $FN$ ) at the selected cut-off values were considered for each antibody-antigen target and used to calculate the $F 1$ and $MCC$ scores:

F 1 = \frac{T P}{T P + \frac{1}{2} (F P + F N)} M C C = \frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}

These metrics give a balanced view of recall and precision which are the essential metrics for classifiers. Since there is no accepted likelihood cutoff value to decide whether a residue is predicted to be in the epitope, for comparison we consider the top 10, 20, 30, 40, and 50 residues and count the number of true epitope residues among them. Indeed, most epitope lengths fall within that range⁵⁶, and the average epitope length for the antibody-antigen targets in BM5 was 21 residues. For all peer servers, the performance data was obtained by running epitope mapping jobs for all 40 targets in BM5. For evaluating homology modeling, we compared the ClusPro-AbEMap’s performance by using homology models of the antibodies as generated by the server. The epitopes on the corresponding antigens were mapped by the antibody specific server EpiPred for comparison. Previously used homology benchmark sets were built with more relaxed sequence identity thresholds^30,57,58 (< 90% instead of < 80% used for this manuscript) and thus were not included in our study.

Applications of the method

There are three major application modes of AbEMap, depending on the availability of prior information on the antibody, which can be provided as (1) an X-ray crystal structure, (2) a predicted structural model, or (3) only the amino acid sequence. Inspired by ClusPro-TBM^59,60 and in order to address a potentially wider community, AbEMap provides the third option to start from antibody sequence and antigen structure, which is not available with servers providing the other two application modes. For users with the X-ray structure of the antibody or the resources for antibody structure prediction, the AbEMap server provides simplified options for modes (1) and (2) that require uploading PDB structures, and then, based on ClusPro docking results, maps the epitope residues on the antigen. In what follows, we demonstrate the application of the three different modes of AbEMap to some targets from the widely-used protein-protein docking benchmark set BM5^29,52. The 28 unbound antibody crystal structures from this benchmark set as well as their sequences were used for epitope mapping entirely through AbEmap (demonstrating modes 1 and 3). To showcase the application with independently predicted antibody structures (mode 2), we used AlphaFold2 to model the antibodies in the set using the program and parameters currently made public^39,40.

Epitope mapping starting from an antibody X-ray structure.

This is the simplest option provided by AbeMap. The server was applied to the 28 unbound antibody-antigen targets in BM5 (Supplementary Table 1). We show $F 1$ and $MCC$ scores when considering the true epitope residues among the top 10 up to the top 50 residues ranked using the predicted epitope likelihood score and averaging the obtained scores over all 28 cases (Fig. 2). AbEMap obtained an average ROC AUC score of 0.738 (Fig. 1K) and $F 1$ and $MCC$ scores of 0.304 and 0.249, respectively. For the 12 bound antibody-antigen complexes in BM5, where the antibodies were crystallized in complex with the respective target antigens and only the antigens were crystallized separately, AbEMap’s ROC AUC score increased to 0.822, whereas the $F 1$ and $MCC$ scores, 0.297 and 0.249, respectively, did not substantially change. We also compare the performance by AbEMap to that of the epitope prediction methods SEPPA, EpiPred and BEpro (Fig. 2). The AbEMap results are better than the ones obtained by these alternative methods.

Figure 2 ∣ — Epitope mapping performance of four different servers tested on 28 unbound-unbound antibody-antigen complexes in the benchmark set BM5. $F 1$ and $MCC$ scores for ClusPro-AbEMap, SEPPA, EpiPred and BEpro at different cut-off thresholds when the antigen residues are ranked by the obtained scores. The measures are averaged over the 28 complexes. The AbeMap results are slightly better than the ones obtained by SEPPA, and substantially better than the ones obtained by BEPro and EpiPred.

Epitope mapping starting from modeled antibody structures (provided by the user).

This AbEMap application is meant for users who have access to specialized antibody homology modeling software or have their own modeling programs. As noted in our earlier paper²⁶, the ClusPro server is often used for docking homology models of protein complex components. Antibody modeling programs such as Rosetta⁶¹, PIGSPro⁶², LYRA⁶³, Repertoire Builder⁶⁴, DaReUS-Loop⁶⁵ and SAbPred’s ABodyBuilder⁶⁶ are some of the few widely used antibody modeling programs users can utilize. As noted by Marks and Deane, apart from the heavy chain’s CDR3 loop (H3 loop), most antibody homology modeling programs can generate antibody models within 3Å root mean square deviation (RMSD) from the native structure⁶⁷. Therefore, users might choose using one of the available programs listed above. However, such models tend to have more steric clashes than the X-ray structures, and use a special PIPER coefficient set that reduces the steric penalty to account for this.

The recent introduction of AlphaFold2⁴⁰ has quite justifiably excited the field. Due to the ability of AlphaFold2 to predict very accurate structures from sequence for most proteins⁴¹, we expected that the method can also be used for modeling antibodies. Therefore, we tested AlphaFold2-modeled antibodies for mapping epitopes and compared the results to those obtained for X-ray structures and from antibody sequences using AbEMap’s built-in homology protocol, obtaining ROC AUC scores for the 40 antibody-antigen targets in the BM5 set, including the 28 unbound and 12 bound targets (Table 1). AlphaFold2 was used to model antibodies with and without templates. Adding templates to predict antibody structures did not improve the performance of epitope prediction for bound antibody-antigen targets, but yielded a slight improvement for unbound targets. Results obtained from antibody sequences by AbEMap (Table 1), to be discussed in the next section, demonstrate that AlphaFold2 is not necessarily the best method for modeling antibodies⁴¹. Lastly, a correct bound structure of the antibody increases the accuracy of the epitope prediction notably as shown in the 11% improvement from the unbound crystal structure and a 6.5% improvement from the internally homology-modelled antibodies (Table 1). So, input of an antibody structure closer to the native structure yields better epitope prediction accuracy.

TABLE 1 ∣.

ROC AUC scores for AbEMap with four different types of input for the antibody^a

ROC AUC scores	X-ray structure	Internal homology	Alphafold2	Alphafold2 with templates
Bound-unbound (12)	0.822	0.772	0.695	0.695
Unbound-unbound (28)	0.738	0.736	0.726	0.732
All (40)	0.763	0.747	0.717	0.721

Open in a new tab

Results are averaged on the 12 bound-unbound and 28 unbound-unbound antibody-antigen complexes in the benchmark set BM5

Epitope mapping starting from antibody sequences.

This application is an adaptation of the Cluspro-TBM server^35,60 introduced for round 13 of the CASP/CAPRI protein docking experiment⁶⁸. Users need structural data for the antigen, but only sequence data for the antibody. To account for the uncertainties from the templates or the inaccurate modeling, AbEMap uses an ensemble of models as described. Results obtained using these models are similar to the ones obtained when using the separately solved X-ray structures of the antibodies (Table 1). Four examples of AbEMap’s predictions from the table are shown (Fig. 1G-1J): modelled murine anti-tissue factor (PDB ID 1FGN) and tissue factor (PDB ID 1TFH), modelled humanized Fab D3h44 (PDB ID 1JPT) and tissue factor (PDB ID 1TFH), modelled anti-CCL2 neutralizing antibody (PDB ID 4DN3) and monocyte chemoattractant protein (PDB ID 1DOL), modelled anti-shh chimera Fab fragment (PDB ID 3MXV) and sonic hedgehog N-terminal domain (PDB ID 3M1N) respectively. AbEMap is able to predict 73.07%, 83.3%, 100% and 62.5% of the true residues in the 30 top ranking residues.

When using antibody models rather than X-ray structures the placement of the H3 loop is particularly important for the success of epitope mapping.⁶⁹ An example of how incorrect modeling of this loop can skew the epitope prediction is demonstrated by epitope mapping of the major allergen Birch pollen Bet V1 with the monoclonal IGG antibody (PDB ID 1FSK) (Fig. 3A). In this example, an ensemble of eight templates was used to model the antibody. The native antibody pose, shown in purple, has its heavy chain CDR3 highlighted in cyan. The templates (three of the eight shown in green) are aligned to the native antibody. The native antigen is represented as a blue surface with the true epitope residues highlighted in red. The modeling of the antibody’s heavy CDR3 loop forces a clash with the antigen, which prevents the loop from going past the protruding structure of the Glu-Gly-Asn segment of the antigen (Fig. 3A). This results in ROC AUC for 1FSK crystal prediction to be 0.918 versus only 0.729 for homology modeling. Furthermore, whereas AbEMap was able to capture a true epitope residue as the top ranked residue when using the crystal structure, using the homology modeling approach one needs the top ranked 20 residues to capture the first three true epitope residues.

It is not always true that using homology modeling of the antibody performs poorly compared to using the X-ray structure of the separately crystallized antibody. An example of how the ensemble approach is able to compensate for the uncertainty of the antibody structure, are shown in the results for the antibody-antigen complex 3EOA (Fig. 3B). Homology-based models of the Fab fragment of Efalizumab (PDB ID 3EO9) and the crystal structure of Lymphocyte function-associated antigen 1 (PDB ID 3F74) are shown. The antigen is shown as a blue surface with the true epitope residues colored in red. For each docking result from five selected templates, the centers of mass of all the docking cluster centers are shown as grey pseudo-atoms around the antigen. The top cluster representatives of each of the five templates are shown in cartoon representation. Four of the five top conformations place the antibody almost entirely over the epitope, which improved the ranking of epitope residues. The ROC AUC score went from 0.566 for the X-ray structure input of the unbound antibody to 0.71 when using homology modeling, an increase of nearly 27%.

An interesting case is when homology modeling improves upon the results even from the ones using the bound Fab fragment (Fig. 3C). The target complex 1BJ1 is a neutralizing antibody crystallized with vascular endothelial growth factor (VEGF). The BLAST search followed by the selection filtering described earlier produced only a single template for docking. The top cluster representative of the docking result is shown in the cartoon, while the rest of the cluster centers are shown as gray pseudo-atoms. The top conformation overlaps with the true epitope shown as red surface, and some of the cluster centers are also in that vicinity capturing different parts of the epitope. With a ROC AUC score of 0.736 for the 28 unbound antibody-antigen complexes, the accuracy of the epitope mapping is not that far behind the results obtained for unbound X-ray structures (Fig. 3D). The results in Table 1 and Supplementary Table 1 suggest that the use of homology models provides very similar accuracy to that of using separately solved antibody structures, indicating that due to the flexibility of CDRs information on the unbound structure of the antibody does not provide substantial advantage over antibody models.

Comparison with existing methods

SEPPA 3.0 and BEpro were chosen for comparison as they were shown to be the best two epitope mapping servers in the recent publication by Zhou and colleagues¹². Note that these two servers are antibody agnostic, meaning they only accept antigen structure as the input. ClusPro-AbEMap outperforms the other probabilistic servers using the well-accepted ROC AUC measure for the 28 unbound-unbound cases in BM5 (Fig. 1K; Fig. 2). The ROC AUC scores were 0.738, 0.703, and 0.655 for ClusPro-AbEMap, SEPPA3.0, and BEpro, respectively. The added structural information on the antibody provides AbEMap with valuable information on the antibody-antigen interface that gives a 4.9% improvement over SEPPA 3.0 and a 12.7% improvement over BEpro despite not being reinforced with machine learning components like the other two servers. Another server that was chosen for comparison was Epipred, which is not antibody-agnostic and outputs a deterministic prediction of three localized epitope patches. The performances of the above three servers and that of EpiPred were compared using $F 1$ and $MCC$ scores for 27 of the unbound complexes from BM5 (as mentioned, EpiPred did not work for PDB ID 2I25). When taking the 20 top ranked residues from each server, AbEMap improves the $F 1$ scores by 10% and 60% compared to SEPPA and BEpro, respectively, and more than doubles to that of EpiPred (Fig. 2). In terms of the $MCC$ , AbEMap’s improvement upon SEPPA and BEpro is 14% and 97%, respectively, while a nearly 3-fold improvement (2.7 times) on EpiPred is observed. For 23 of the 27 cases analyzed, AbEMap was able to predict at least one true epitope residue in its top 20 ranked residues. Both SEPPA and BEpro were able to get at least one epitope residue accurately in the top 20 for 24 of the 27 cases, while EpiPred failed to produce any true positives in the top 20 for 13 of the 27. It should be noted that EpiPred predicts three possible non-overlapping epitope patches on the antigen and does not give a likelihood or probability score for residues. Therefore, we gave the same high scores to all the residues in the top-ranked epitope followed by the next high score to the second-ranked epitope residues and so on. The numbers of true positives in the top 20 ranked residues were compared for AbEMap, SEPPA, BEpro, and EpiPred for each of the 40 cases in BM5 (with no EpiPred results for 2I25) (Supplementary Table 2).

In order to assess how ClusPro-AbEMap compares to peer servers that are also antibody specific when using internal homology modeling by the server, the 28 unbound-unbound cases from BM5 were modeled from templates with no more than 80% global sequence identity (GSI). Recent papers on epitope prediction used templates up to 90% GSI,⁷⁰ which is too close in our view. We compared AbEMap with itself when taking only a single template and EpiPred using the same template (Fig. 1L). The template for modeling with MODELLER was chosen as the best homologue with the highest CDR3 sequence identity. The resulting models were entered into AbEMap and EpiPred for comparison. When taking the top ranking 20 residues, AbEMap with just one template improves the $F 1$ score by 75%. When AbEMap uses the union of the 5 best ranking models by CDR3 sequence identity and the 5 best ranking models by CDR3 sequence similarity as templates, the average $F 1$ and $MCC$ scores are more than double of that obtained by EpiPred. The best $F 1$ score of 0.306 was obtained when considering the top 30 residues predicted by AbEMap based on the top five templates.

The Weng lab recently updated the docking benchmark set with a new set of antibody-antigen complexes that are not found in BM5, resulting in an extended benchmark set BM5.5²⁹. The performance of AbEMap was tested on 21 new rigid body cases from the BM5.5 set and compared with that of EpiPred. We revealed the $F 1$ and $MCC$ scores of AbEMap with the unbound structures as input, with the sequences as input, and also EpiPred’s result with the unbound structures (Fig. 4). Interestingly, AbEMap performs slightly better with the template-based approach than with the unbound crystals, 0.204 versus 0.196 when considering the top ranked 30 residues. Using the unbound antibody structures AbEMap identified at least one true epitope residue among the 30 top ranked residues for 20 of the 21 targets, while it predicted more than 10 true epitope residues for only one of the 21 targets. The homology modeling approach, on the other hand, predicted at least one true epitope residue only for 15 of the 21 targets, but more than 10 true epitope residues for 5 of the 21 targets. This further emphasizes that, at least for the targets in BM5.5, the homology modeling approach helps to enhance prediction accuracy when good homologues are found. When considering the top 40 and top 50 residues, however, using the unbound X-ray structure of the antibody performs slightly better than the template based approach in terms of $F 1$ and $MCC$ scores. Using either antibody structures or homology models AbEMap outperforms EpiPred in all ranking cutoffs studied. At the top 30 cutoff, AbEMap unbound and AbEMap homology modeling perform ~72% and ~78% better than EpiPred using the crystal structures.

Figure 4 ∣ — Comparing AbEMap’s performance on X-ray and homology-modeled antibodies as inputs for 21 new antibody-antigen targets in the benchmark set BM5.5 with EpiPred’s predictions based on X-ray structure inputs alone. The prediction metrics are averaged to obtain the $F 1$ and $MCC$ scores shown. Interestingly, AbEMap performs slightly better with the template-based approach than with the unbound crystals when considering the top ranked 10, 20, or 30 residues. This emphasizes that the homology modeling approach may enhance prediction accuracy when good homologues are found. However, when considering the top 40 and top 50 residues, using the unbound X-ray structure of the antibody performs slightly better than the template based approach.

Limitations

The major limitations of ClusPro-AbEMap are as follows:

1) Candidate homologs should be homologs for both heavy and light chains as in some cases the heavy chain’s homologs and that of the light chain do not match. If they do not match, even if the homologs are highly similar to one or both of the individual chains, the server does not have the capability to find a suitable relative orientation of the two chains. So, these potentially helpful homologs are not utilized for template-based modeling.

2) Rigid-body docking of the antibody and antigen structures may limit the accuracy of results. It is known that the CDR3 of the heavy chain is one of the most flexible loops of the antibody. Since the underlying docking program, PIPER, uses a rigid-body docking approach, the conformational change upon complex formation is not taken into account. Regular docking using ClusPro includes local minimization of the energy of the docked structures, which removes the clashes and may introduce some induced conformational changes. Since it retains and analyses a much larger set of models, AbEMap does not perform any local energy minimization. However, the negative effects of this limitation is tempered by the aforementioned coefficient set which removes the stringent vdW terms during docking.

3) AbEMap does not include any systematic clustering of the predicted epitope residues to identify a localized epitope patch unlike some peer servers. This can be a disadvantage as some residues with high likelihood scores might be irrelevant if they are dispersed in an isolated manner, whereas clustering of such residues can signal true positives. Observing such clusters can help the user to make better selection of true epitope residues. However, we were unable to obtain consistent improvement of the results in the general case, and hence our protocol does not include clustering of the predicted epitope residues.

MATERIALS

EQUIPMENT

A computer with internet access and a web browser.
Atomic resolution structure of the antigen target. The PDB ID can be used to directly fetch the structure or the structure may be uploaded from the computer.
Atomic resolution structure or sequence of the antibody. In the case of structural information, the PDB ID can be used to directly fetch the structure or the structure may be uploaded from the computer.
Access to PyMOL or similar structure viewing software is recommended but not required. PyMOL can be downloaded from www.pymol.org. Alternatively, you can use any molecular viewer that supports the visualization of multiple structures in one PDB file.

PROCEDURE

Entering the basic inputs – TIMING ~1-2 min

CRITICAL STEP The status updates for AbEMap runs are described in Box 1.

Box 1 ∣. Status updates for AbEMap runs.

Processing pdb files.

Downloading the PDB file from rcsb, processing files and selecting chains specified by user.

Pre-docking minimization.

Running CHARMM to add missing atoms and hydrogen, minimizing added atoms and if running homology mode, creating homology models.

Copying to supercomputer.

Copying working files to the computer cluster

Held on supercomputer.

The working files have been copied to the computer cluster, the job hasn't started yet

In queue on supercomputer.

The job has been submitted and is in the queue

Running on supercomputer.

The job is running

Calculating epitope residues.

Calculating the epitope residue scores

Finalizing job.

Preparing the working files to be transferred back to local computer

Copying to local computer.

Copying working files to local computer

Finished.

Job is complete

Error on local system.

Error processing PDB files, check error messages

Error on supercomputer.

Error running job, check error messages.

CRITICAL STEP A list of potential error messages that may be encountered and the reasons they are generated are given in Box 2.

Box 2 ∣. Error messages and their meanings.

XXXX not found in PDB.

The pdb XXXX is downloaded from https://www.rcsb.org/. This process will fail if the PDB file is not present on the site or, occasionally, if the site is down.

Unknown residue XXX in antigen.

Please remove. XXX is the three letter code for a residue in the antigen. If ClusPro does not recognize the residue it will fail to dock. Please replace with standard residue.

Unknown residue XXX in antibody.

Please remove. XXX is the three letter code for a residue in the antibody. If ClusPro does not recognize the residue it will fail to dock, Please replace with standard residue.

Antigen chains must be fewer than 20 characters.

Chain specification is incorrect.

Antigen chains must be white space separated alphanumeric characters.

Incorrect or missing antigen chain specification.

Antigen PDB ID must be 4 alphanumeric characters.

Invalid PDB code.

Antigen file too large.

PDB file exceeds maximum allowed size.

Antigen file only partially uploaded.

Network error during upload, or the PDB file is too large.

Antibody chains must be fewer than 20 characters.

Chain specification is incorrect.

Antibody chains must be white space separated alphanumeric characters.

Incorrect or missing antibody chain specification.

Antibody PDB ID must be 4 alphanumeric characters.

Invalid PDB code.

Antibody file too large.

PDB file exceeds maximum allowed size.

Antibody file only partially uploaded.

Network error during upload or the PDB file is too large.

Copy of antigen failed.

File did not transfer to the computer cluster.

Copy of antibody failed.

File did not transfer to the computer cluster.

Processing failed on antigen.

Error processing antigen file. This issue usually occurs in the minimization process if steric clashes or nonphysical bonds cause this process to fail.

Processing failed on antibody.

Error processing antibody file. This issue usually occurs in the minimization process if steric clashes or nonphysical bonds cause this process to fail.

Job ran out of memory on server.

Antigen and or Antibody are too large.

Repulsion must be in whitespace separated chain-residue format.

Incorrect format for the Repulsion list.

Access the server located at https://abemap.cluspro.org/ where you will be prompted to sign in, create an account, or use the server anonymously.
Sign in regularly if an account has already created or register to create a new account. Once the registration is complete a password will be sent to the email supplied. The password can be changed by clicking on the ‘Preferences’ option. To use the server without the benefits of an account click on the option ‘Use the server without the benefits of your own account’.

CRITICAL STEP An annotated screenshot of the AbEMap initial job submit page is provided in Fig. 5.

CRITICAL STEP New users need an educational or governmental email address to create an account.

CRITICAL STEP All jobs submitted without an account are publicly accessible.
Select the ‘Epitope Mapping’ option to use the AbEMap functionality.
Enter in a job name for the submission (optional).

CRITICAL STEP If this option is left blank a unique name will be provided by the server.
Input the antigen structure in PDB format. The docking procedure will remove all HETATM atoms from the PDB input, including water molecules, cofactors and ligands. Only the 20 standard amino acids and nucleotides will be retained.
Enter the structure in one of two ways, using a PDB ID (A) or by uploading a PDB format file (B).

Figure 5 ∣ — AbeMap initial job submit page. (A) The space to enter in the job name. (B) The 4 character code which specifies the PDB is entered here. (C) The chains to be investigated go here. (D) The FASTA code for the antibody is entered into this textbox. (E) The users MODELLER key is entered here. (F) The PDB IDs to be excluded in forming the homology model go here. (G) Click here to select/deselect masking non-CDR regions. (H) Submit the job by clicking map.

(A). Import the antigen structure using PDB ID

The 4 character ID is used to automatically download the coordinates from rcsb.org

(B). Upload the antigen structure using a PDB format file

Click on browse and select the file to be uploaded

CRITICAL STEP If there are multiple structures in the given PDB file then our procedure will only consider the first structure.
CRITICAL STEP Nonstandard amino acid residues are not supported and will result in an error when submitting.
1. In the chains field enter the chains on the antigen to be used for epitope mapping. The chain IDs should be separated by white space.
  
  CRITICAL STEP If no chains are specified in this field all the chains will be used.
2. Enter the antibody input type using either of available formats: as a sequence in FASTA format or as a structure in PDB format. The choice of the input format will determine which submission options will be visible and active for users on the webpage. As a result, the following steps for submission are broken into a sequence based (FASTA format) procedure (A) and a structure based (PDB format) procedure (B). Once the format type is selected the user will follow the corresponding steps for that procedure.

(A). FASTA format sequence-based procedure

Select the FASTA input type.
Once the decision to use the FASTA format is made there are two options for entering the sequence as follows:
- Either enter the antibody sequence in FASTA format into the provided text box. An example is provided in the box;
- Or upload the antibody sequence as a FASTA file using the upload FASTA option.
  
  CRITICAL STEP The sequence in FASTA format should include the chain IDs.
Enter a MODELLER key into the provided box. A MODELLER key (a passcode that allows users to continue with modeling the antibody) can be obtained from https://salilab./modeller/registration.html.

CRITICAL STEP MODELLER key is a passcode obtained by users after registering on the above website. It allows users to utilize MODELLER which is a crucial tool used in the antibody modeling step. The key is not specific to the antibody, but rather to the user.

CRITICAL STEP No sequence based epitope prediction can be performed if the user does not have a MODELLER key.
Select either of the two following advanced options that are currently present when using FASTA format as the input:
- Define an exclusion list. If there are structures that should not be used by MODELLER as templates for the construction of the antibody structure they should be listed. The models should be listed by PDB IDs separated by whitespace. This would be beneficial for testing the server and/or comparing predictions based on different homologue availability.
- Automatically mask non-CDRs. If selected the server masks regions on the antibody that are not part of the CDRs. All areas of the antibody are considered if this option is not selected. This reduces the areas under consideration for docking and increases accuracy while reducing computational time.
  
  CRITICAL STEP Selecting either advanced option is optional.
  
  CRITlCAL STEP By default, the masking option is already selected for users since it was found to yield the best results.

(B). PDB format structure-based procedure

Select the PDB input type (as shown in Fig. 6). This option takes in the antibody structure in PDB format. The docking procedure will remove all HETATM atoms from the PDB input, including water, cofactors and ligands, only the 20 standard amino acids and nucleotides will be retained. If there are multiple models in the given PDB the docking procedure will only consider the first.
The structure can be entered in two ways:
- Either import the structure using the corresponding 4 character PDB ID that AbEMap uses to download the coordinates directly from rcsb.org.
- Or upload the PDB by clicking on browse and selecting the PDB file to be uploaded.

Figure 6 ∣ — AbeMap job submit page after selecting the 'Use PDB' option for the antibody. (A) The 4 character code which specifies the antibody PDB is entered here. (B) If the structure is homology-based select this checkbox. (C) The option to automatically mask non-CDR regions is toggled here. (D) The residues for repulsion are entered here.

CRITICAL STEP If there are multiple structures in the given PDB file then our procedure will only consider only the first structure

CRITICAL STEP Nonstandard amino acid residues (labeled as ATOM in the PDB file) are not supported and will result in an error when submitting. Entering the PDB ID does not cause an error.

In the chains field enter the chains on the antibody to be used for epitope mapping. The chain IDs should be separated by white space. If no chains are specified in this field all the chains will be used.
Select if the antibody structure is a homology model. If the custom structure provided is a homology modeled antibody then select the option “My antibody structure is a modeled structure.”

CRITICAL STEP AbEMap provides a different set of weights and energy functions for homology modeled structures.
When using the PDB input options two optional advanced options are currently present to select: to automatically mask non-CDRs (A) and to define repulsion (B).

(A). Automatically mask non-CDRs

i) Select automatically mask non-CDRs to have the server mask regions on the antibody that are not part of the CDRs. All areas of the antibody are considered If this option is not selected.

CRITlCAL STEP By default, the masking option is already selected for users since it was found⁴⁶ to yield the best results.

(B). Define repulsion

i) As an alternative to automatic masking, you may want to provide information about what antibody residues are not in the binding interface manually. To bias the docking against those residues being in the binding interface you can add a repulsion term to the residues you select. This can be achieved by providing a list of residues or a masking file and can be done in either of following two ways:

Enter a list of residues in the Repulsion text box.

CRITICAL STEP The residues should be separated by white space and have the form chain-residue number (e.g., a-27).
Generate a masking file by opening up the PDB file in PyMOL and using the sequence option to select the residues that should be excluded. Save this selection to a masking file in PDB format and upload.

Submitting the job and obtaining the results – TIMING ~45 – 90 minutes

Once all the desired options have been selected submit the job using the map button. Monitor the job status using the Queue page (Fig. 7, see Box 2 to interpret the status updates).

CRITICAL STEP If the job has been submitted using an account an email will be sent on completion to notify the user the job is done.
Obtain the results for the job under the Results tab.

CRITICAL STEP The results will remain on the server for at least 2 months but after this time they may be removed.

Figure 7 ∣ — AbEMap status page. The page shows the job number in the AbeMap que, the job ID, the status of the job, the submission date and time, potential errors, and the number of rotations used during the docking stage. Further details on the job can be viewed by clicking the job number. Next, the page shows the input structures read by the program as cartoons, and the structures after pre-processing. Finally the advanced options selected by the user are shown at the bottom of the page.

Analyzing the results – TIMING ~30-40 min

View the results by selecting the Results tab and selecting the job ID number. The results will differ depending on the input selection, FASTA (A) or PDB format (B).

CRITICAL AbEMap produces several types of results files depending on the input format selected by the user. These files come in three formats, PDB, PSE and Residue Scoring Files. Downloading and viewing the results are described in Box 3. The significance of these files and the means to access the information is provided in Box 4.

Box 3 ∣. Results for the different types of inputs.

(A). FASTA input

This job is run using 3 sets of coefficients as follows: (1) No VdW (neither van der Waals repulsion nor van der Waals attraction; (2) Reduced Attraction VdW, and (3) using the parameters of the Antibody Mode of ClusPro. By default the results page displays the average model for the No VdW score. Access the average score for the other coefficients by selecting the specific option. To download the average model select the Average label.

Advanced mode allows the user to view all the models generated for the corresponding coefficients. Click on the Download All Models option to download all models.

Download individual models by selecting the model number label. The downloaded models are in the PDB format.

View the model scores by selecting the "View Model Scores" option. The scores are shown for the average model and the No VdW coefficient. To view the Reduced Attraction VdW or Antibody Mode select those options.

Select the advanced option then select the model to view the scores to view individual model scores. The scores represent the likelihood of the residue being in the epitope.

Download the scores in the csv format by selecting the "Download Residue Scores for this Coefficient" option.

(B). PDB format input

The result page for this option has the atom based average model displayed. Download it by selecting the Average option.

View the scores for this model by selecting the View Model Scores option. The scores represent the likelihood the residue will be present in the epitope.

Download these scores by selecting the Download Residue Scores for this Coefficient option, they will be downloaded in the csv format.

Box 4 ∣. Detailed description of result files.

PDB files

PDB files provide structural information for all atoms in the model, each line provides descriptors to specify the atom and the coordinates for the atom. The files can be accessed as a text files or viewed using molecular visualization software such as PyMOL. The models are named in this format: model.001.002.pdb. The first number (001 in this case) in the name specifies the coefficient set used to create the model. The second number (here 002) specifies the model number for that coefficient set corresponding to the rank of the model determined by PIPER with the highest ranked model denoted by 001. For homology modeling using multiple templates models are given for each template used, and an additional model, which includes the average likelihood scores shown in the place of temperature factors, i.e., in columns 61-66 of ATOM records. If X-ray structure is used as the input, still 2 model files still generated (model.000.001.pdb and model.000.002.pdb), but the two PDB files are the same, with the difference being that the second file includes the average likelihood scores. When modelled antibodies or homology modeling is used, the above output is repeated for each coefficient set (see below).

For modelled antibody inputs from the user or sequence inputs, results generated using 11 different coefficient sets (denoted as 000 to 010) can be downloaded. The coefficient values for the five energy terms included in PIPER energy and the minimum attractive van der Waals energy term allowed (Min. $E_{a t t}$ to ensure that the two protein contact each other. The general form of the energy function for the coefficient set 000 is given by

E_{000} = 0 E_{r e p} - 0.2 E_{a t t} + 300 E_{C o u l} + 30 E_{B o r n} + 0.2 E_{D A R S}

and all coefficients are shown in TABLE 3:

Coefficient sets 003, 005, and 007, shown in boldface, represent the default options recommended for use with antibody models (see the Anticipated Results section).

PSE files

PSE files are PyMOL specific molecular visualization files. The average structures created by AbEMap are provided in this file type. The average files are named in the following manner: average_coef000_session.pse. The 000 specifies the coefficient set used to create the average structure.

Residue Scoring File

The Residue Scoring File is in the comma separated values (.csv) format. The scores are listed in four columns: Chain, Residue, Resn, and Score. Chain denotes the chain the residue is in, residue denotes the residue number, resn denotes the residue three letter code and score denotes the likelihood of the residue being in the epitope.

TIMING

Steps 1-11, entering the basis inputs: 1-2 min

Steps 12-13, submitting the job and obtaining the results: 45 – 90 minutes

Step 14, analyzing the results: 30-40 min

TROUBLESHOOTING

Troubleshooting advice can be found in Table 2.

TABLE 2 ∣.

Troubleshooting table

Step (s)	Problem	Possible reason	Possible solution
6Ai / 8Bii	XXXX not found in the PDB	The PDB XXXX is downloaded from www.rcsb.org. This process will fail if the PDB file is not present on the site or the PDB ID is incorrectly entered or occasionally if the site is down.	Verify that the PDB IDs are accessible (and not obsolete) on www.rcsb.org
	[Antigen/Antibody] PDB ID must be 4 alphanumeric characters	Invalid PDB ID that has greater or less than 4 alphanumeric characters	Please make sure to enter valid PDB IDs for both the antibody and antigen accordingly
	Unknown residue XXX in [antigen/antibody]. Please remove	XXX is the three letter code for a non-conventional or modified residue in the antigen/antibody. If ClusPro does not recognize the residue it will fail to dock.	Please replace with standard residue
	Copy of [antigen/antibody] failed	The input files were not copied to the computer cluster due to some reason.	Check the input files
	[Antigen/Antibody] file too large	The PDB or FASTA file uploaded is too big due to the file type or the number of atoms is too big.	Use a smaller domain of the antigen and/or the FAB region of antibody that is most likely to be in the interface as input.
	[Antigen/Antibody] file only partially uploaded	There might have been an error during upload or the file might be too big.	Consider smaller regions of the input proteins
7 / 9	[Antigen/Antibody] chains must be fewer than 20 characters.	Chain specification is incorrect or you have submitted >20 chain IDs.	Please make sure to use less than 20 chains.
7 / 9	[Antigen/Antibody] chains must be white space separated alphanumeric characters	Chain inputs are not separated by white space	Make sure to separate chain IDs by space.
11Bi	Repulsion must be in white space separated chain-residue format	Incorrect format for the Repulsion list.	Make sure the repulsion list is in white space separated chain-residue format
12	Processing failed on [antigen/antibody]	Error processing antigen/antibody file. This issue usually occurs in the minimization process if steric clashes or nonphysical bonds cause this process to fail.	Take a closer look at the input structures and remove clashes then resubmit the edited structures
	Job ran out of memory on server	Antigen and or Antibody are too large.	The best route is to consider smaller domains of the antigen protein and/or just the FAB fragment of the antibody
	Not enough lines in output file	This error occurs when our computer cluster experiences a lag and does not respond to the server’s request for an update.	We recommend reaching out to our team via https://abemap.cluspro.org/contact.php so that we can push along your job to avoid resubmission

Open in a new tab

ANTICIPATED RESULTS

Mapping hen egg white lysozyme epitopes binding two different antibodies

We mapped epitope residues of hen egg white (HEW) lysozyme with two different antigen receptors: shark single domain antigen receptor 2I24 in the complex 2I25, and the D1.3 anti-HEW antibody fragment 1VFA in the complex 1VFB (see Supplementary Table 1). First we consider that the antibodies are given by their Xray structures. For the complex 2I25, the average mapping results of the HEW lysozyme (PDB ID 3LZT) in surface format (blue to red showing increasing likelihood score by AbEMap) are compared in Fig. 10A with the true orientation of the antigen receptor (PDB ID 2I24), shown as an orange cartoon. For comparison, the true placement and orientation of the D1.3 antibody (PDB ID 1FVA) in the other complex 1VFB on the same HEW antigen are shown as a semi-transparent green cartoon. To show the results for 1VFB complex, the average mapping results of the HEW lysozyme (PDB ID 8LYZ) in surface format (blue to red showing increasing likelihood score by AbEMap) were obtained with the green cartoon representing the D1.3 antibody (PDB ID 1FVA) and the antigen receptor (PDB ID 2I24) shown as semi-transparent orange cartoon (Fig. 10B). In the top 20 predicted residues, using the coefficient set 007 (the no vdW option), AbEMap finds 10 of the true 22 residues for 3LZT and 11 of the true 19 epitope residues for 8LYZ (Supplementary Table 1).

Figure 10 ∣ — Results of epitope mapping for two lysozyme-antibody complexes using each of the three epitope mapping modes in AbEMap (X-ray structure, model, or sequence based). All figures show the hen egg white (HEW) lysozyme in surface view. Blue to red shows increasing predicted likelihood scores. The orange cartoons show the shark single domain antigen receptor (PDB ID 2I24) in the complex 2I25, and the green cartoons show the D1.3 anti-HEW lysozyme antibody (PDB ID 1VFA) in the complex 1VFB. Whenever the cartoon is semi-transparent, the epitope mapping result on the HEW lysozyme is shown for the other antibody. Panels (A), (C) and (E) show, respectively, mapping results for the 2I24 antibody on 3LZT when the antibody 2I24 is defined by its X-ray structure, its AlphaFold2 generated model, and just its sequence. Panels (B), (D) and (F) show, respectively, mapping results for the 1VFA antibody on 8LYZ when the antibody 1VFA is defined by its X-ray structure, its AlphaFold2 generated model, and just its sequence. As shown, the results for the two complexes are similar when using the independently solved X-ray structures. In the 20 top ranked residues AbeMap finds about 50% of the true epitope residues in both cases. Using the Alphafold2 models the results get slightly better for 2I25 but slightly worse for 1VFB. However, the predictions are poor for both complexes when using the internal homology modeling of AbeMap as most predicted antibody residues are in a region of the lysozyme on the opposite side from true antibody binding.

We used the same two antigen receptor complex examples (2I25 and 1VFB) with the HEW lysozyme as the antigen to demonstrate the modelled antibody structure mode of epitope mapping with AbEMap. The results of epitope mapping using the AlphaFold2 generated models of antibodies 1VFA and 2I24, respectively, were obtained where the antigen is viewed from the same angle as with using X-ray structures as the input above for easy comparison (Fig. 10C and 10D). In the top 20 predicted residues, mapping with AlphaFold2 models finds 11 of the true 22 residues on 3LZT and 7 of the 19 true epitope residues on 8LYZ, which is slightly less for 8LYZ but actually slightly more for 3LZT than the number found based on the X-ray structures of the antibodies using coefficient set 007 (the no vdW option). For coefficient set 003 (the reduced attractive VdW option), the results change to 8 of the 22 and 10 of 19, respectively, thus better compared to using X-ray structures of 8LYZ but worse compared to using X-ray structures for 3LZT). When we only have the sequence of the antibody to start with, submitting the FASTA sequences of 1VFA and 2I24, excluding the true complexes 1VFB and 2I25, respectively, are shown in Fig 10E and 10F. We note that for 1VFA, AbEMap found 7 templates that pass the criteria described earlier (i.e., sequence identity above 20% and e-value below 1e-40). So, 8 different models — one for each template and one for the average model — were generated for each coefficient set. Nevertheless, in these cases the homology models were not accurate enough for AbEMap to perform well, and only 2 and 5 true epitope residues, respectively, are captured in the top 20 predicted residues for 2I25 and 1VFB.

Examples of successful mapping using homology models of antibodies

As shown above, starting from the sequences of 1VFA and 2I24 and using the internal homology modeling tool, AbEMap identified only a few epitope residues. However, the poor performance is an exception rather than the rule, and in most cases the accuracy of epitope prediction based on homology modeling is almost the same as based on a separately crystallized X-ray structure of the antibody (Supplementary Table 1). Here we show results for two complexes. 2W9E includes the structure of ICSM 18, the Fab fragment of a therapeutic antibody complexed with the fragment 119-231 of a human prion protein (Prp)⁷¹. X-ray structures are available both for the antigen (PDB ID 1QM1) and the separately solved antibody (PDB ID 2W9D), see Supplementary Table 1. The second complex, 3MXW, includes the crystal structure of sonic hedgehog bound to the 5E1 fab fragment ⁷². In this case we also have the X-ray structures both for the antigen (3M1N) and the separately solved antibody (3MXV).

For the complex 2W9E, the average mapping results of the human prion protein (PDB ID 1QM1) in surface format (blue to red showing increasing likelihood score by AbEMap) are compared in Fig. 11A with the true orientation of the antibody (PDB ID 2W9D), shown as an orange cartoon. Average mapping results of the sonic hedgehog (PDB ID 3M1N) in surface format (again, blue to red showing increasing likelihood score by AbEMap) were obtained with the orange cartoon representing the antibody (PDB ID 3MXV) (Fig. 11B). In the top 20 predicted residues, using the coefficient set 007 (the no vdW option), AbEMap finds 10 of the true 19 epitope residues for 2W9E and 14 of the true 24 epitope residues for 3MXW (Supplementary Table 1). The results of epitope mapping using the AlphaFold2 generated models of antibodies 2W9D and 3MXV, respectively, are shown in Fig. 11C and 11D. In the top 20 predicted residues, mapping with AlphaFold2 models without templates finds 13 epitope residues on 1QM1, thus almost the same as with the X-ray structure of the antibody, but only 4 epitope residues on 3M1N, representing in major drop in prediction accuracy. Using antibody templates in AlphaFold2 the result becomes even slightly worse, with only 12 and 3 epitopes residues identified. Results of mapping based only the FASTA sequences of 2W9D and 3MXV, excluding the true complexes 2W9E and 3MXW, respectively, are shown in Fig. 11E and 11F. Using internal homology models in the top 20 predicted residues, using the coefficient set 007 (the no vdW option), AbEMap finds 14 of the true 19 residues for 2W9E and 9 of the true 24 epitope residues for 3MXW (Supplementary Table 1). Thus, the results are better than the ones obtained with using the antibody X-ray structure for 2W9E but worse for 3MXW (10 and 14 correct residue predictions, respectively). However. In both cases the results are better than the ones obtained using the Alphafold2 models (13 and 4 correct residues, respectively).

Figure 11 ∣ — Mapping the epitopes for complexes 2W9E and 3MXW. The first complex is a Fab fragment of a therapeutic antibody binding a fragment of a human prion protein. The second complex includes the crystal structure of sonic hedgehog bound to a fab fragment. As for the lysozyme-antibody complexes shown in Fig 10, using the X-ray structures AbeMap finds about 50% of true epitope residues in the 20 top ranked residues (Panels A and B). Using the AlphaFold2 generated models of antibodies AbeMap finds almost the same residues for 2W9E as with the X-ray structure of the antibody, but only four epitope residues for 3MXW, in this latter case representing a major drop in prediction accuracy (Panels C and D, respectively). Using internal homology models and considering the 20 top ranked residues the results are better than with the antibody X-ray structure for 2W9E but worse for 3MXW (Panels E and F). However, the differences are moderate, emphasizing that the use of homology models may provide prediction accuracy similar to those using X-ray structures.

Supplementary Material

Supplement

Supplementary Figure 1 ∣ Impact of loosening the penalty on shape complementarity on the ROC AUC score for the 40 antibody-antigen complexes in the benchmark set BM5. With the hypothesis that reducing the shape complementarity requirements during docking will increase epitope prediction accuracy when dealing with less accurate antibody models, the Van der Waal’s potential’s contribution in ranking docking models was reduced. Coefficient 003 is the normal antibody-antigen coefficient set; coefficient 005 has its attractive Van der Waals potential halved from that of C003; coefficient 007 has weights of zero for both attractive and repulsive Van der Waals potentials. The general trend shows that reducing or removing Van der Waal’s components improves the results to varying degrees when using homology-modelled antibodies. While several combinations were tested, two coefficient sets that improved from the normal antibody coefficient set (C003) were a) reduced attractive VdW (C005), and b) No VdW (C007). For the former set, the attractive Van der Waal’s potential was halved from the normal antibody coefficient set used for crystal structure inputs. The latter coefficient had coefficients for both attractive and repulsive Van der Waals’ potentials that were zero.

Supplementary Figure 2 ∣ Impact of the temperature factor $T$ and the number of decoys on the ROC AUC score for the 40 antibody-antigen complexes in the benchmark set BM5.

Supplementary Table 1 ∣ True positives ( $TP$ ) from the top 10 up to the top 50 ranked residues when using unbound antigens and the crystal structure (internal AbEMap homology models) of the unbound antibody with results for homology models shown in parentheses.

Supplementary Table 2 ∣ True positives when considering the top 20 ranking residues for the 40 cases in the BM5 set.

Supplementary Table 3 ∣ ROC AUC comparison of AbEMap with SEPPA and BEpro.

NIHMS1965875-supplement-Supplement.pdf^{(256.2KB, pdf)}

NIHMS1965875-supplement-2.pdf^{(125KB, pdf)}

Figure 8 ∣ — AbeMap job results page. (A) To view the results scores for the selected model click here (see Figure 9). We recall that using the homology modeling option AbeMap generates multiple models, one for each template. (B) To view all the models for the job select the advanced option. (C) To view models obtained using different coefficient sets, select the coefficients one wishes to view (No VdW; set 003; Reduced attractive VdW; set 005; Antibody Mode; set 007). (D) To download the average PDB file with the likelihood scores in place of thermal factors click here. (E) The figure shows the PyMol generated structure of the antigen in surface view. Blue to red shows increasing predicted likelihood scores.

Figure 9 ∣ — AbeMap job results scores page. (A) To view scores for different coefficient sets, select the coefficients to be viewed here. (B) To download the scores for the selected coefficient click here. (C) The likelihood scores for each amino acid residue in the model are presented here. We recall that for epitope prediction AbeMap uses 1000 structures to calculate the frequency of each antigen surface atom’s occurrence in the antibody-antigen interface, and defines the atomic epitope likelihood score as the Boltzmann weighted atomic interface occurrence frequency averaged over the ensemble of antibody structures. The residue likelihood scores are obtained by summing up the atomic contributions for each residue.

TABLE 3 ∣.

Energy function coefficient sets

	Energy function coefficients
Coefficient Set	$E_{r e p}$	$E_{a t t}$	$E_{C o u l}$	$E_{B o r n}$	$E_{D A R S}$	Min. $E_{a t t}$
000	0.0	−0.2	300	30	0.2	10
001	0.1	−0.2	300	30	0.2	100
002	0.3	−0.2	300	30	0.2	100
003	0.5	−0.2	300	30	0.2	100
004	0.5	−0.1	300	30	0.2	10
005	0.5	−0.1	300	30	0.2	100
006	0.5	0.0	300	30	0.2	100
007	0.0	0.0	300	30	0.2	100
008	0.0	0.0	300	30	0.2	10
009	0.5	0.0	300	30	0.2	10
010	0.0	−0.2	300	30	0.2	10

Open in a new tab

ACKNOWLEDGEMENTS

This investigation was supported by grants DBI 1759277 and AF 1645512 from the National Science Foundation, and R35GM118078, R21GM127952, and RM1135136 from the National Institute of General Medical Sciences.

Footnotes

COMPETING INTERESTS The PIPER docking program, on which the ClusPro AbEMap server is based, has been licensed by Boston University to Acpharis Inc. Acpharis, in turn, offers commercial sublicenses of PIPER. D.K and S.V consult for Acpharis and own stock in the company, D.B. is the acting CEO of the company. However, both the ClusPro and AbEMap servers are free for non-commercial use.

Lab home page: http://structure.bu.edu

Server home page: https://abemap.cluspro.org/

DATA AVAILABILITY

Data for all cases tested in the benchmark set used can be found in the following figshare link: https://doi.org/10.6084/m9.figshare.c.5963376.v1. Detailed results and direct links for the server results are included as Source Data.

CODE AVAILABILITY

AbeMap is available as a server at https://abemap.cluspro.org/ free of charge for non-commercial applications. The server can be used without registration, but in this case the results will be publicly accessible. The advantage of registering is that the job does not show up on the website, but this option is available only to users with educational or governmental email addresses. The server provides options to view the results online, but protein visualization tools allow for more convenient analyses. We use and recommend PyMOL, which was used to demonstrate the analysis of results in this Protocol.

REFERENCES

1.Montgomery RA, Cozzi E, West LJ & Warren DS Humoral immunity and antibody-mediated rejection in solid organ transplantation. Semin. Immunol 23, 224–234, doi: 10.1016/j.smim.2011.08.021 (2011). [DOI] [PubMed] [Google Scholar]
2.Sela-Culang I, Kunik V & Ofran Y The structural basis of antibody-antigen recognition. Frontiers Immunol. 4, 302, doi: 10.3389/fimmu.2013.00302 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Danilov SM et al. Fine epitope mapping of monoclonal antibody 5 $F 1$ reveals anticatalytic activity toward the N domain of human angiotensin-converting enzyme. Biochemistry 46, 9019–9031, doi: 10.1021/bi700489v (2007). [DOI] [PubMed] [Google Scholar]
4.Sela-Culang I. et al. Using a combined computational-experimental approach to predict antibody-specific B cell epitopes. Structure 22, 646–657, doi: 10.1016/j.str.2014.02.003 (2014). [DOI] [PubMed] [Google Scholar]
5.Ehrhardt SA et al. Polyclonal and convergent antibody response to Ebola virus vaccine rVSV-ZEBOV. Nat. Medicine 25, 1589–1600, doi: 10.1038/s41591-019-0602-4 (2019). [DOI] [PubMed] [Google Scholar]
6.Goldstein LD et al. Massively parallel single-cell B-cell receptor sequencing enables rapid discovery of diverse antigen-reactive antibodies. Com. Biol 2, 304, doi: 10.1038/s42003-019-0551-y (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Horns F, Dekker CL & Quake SR Memory B Cell Activation, Broad Anti-influenza Antibodies, and Bystander Activation Revealed by Single-Cell Transcriptomics. Cell Rep. 30, 905–913 e906, doi: 10.1016/j.celrep.2019.12.063 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Kozlova EEG et al. Computational B-cell epitope identification and production of neutralizing murine antibodies against Atroxlysin-I. Sci. Rep 8, 14904, doi: 10.1038/s41598-018-33298-x (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hua CK et al. Computationally-driven identification of antibody epitopes. Elife 6, doi: 10.7554/eLife.29023 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Qi T. et al. SEPPA 2.0--more refined server to predict spatial epitope considering species of immune host and subcellular localization of protein antigen. Nucl. Acids Res 42, W59–63, doi: 10.1093/nar/gku395 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Sun J. et al. SEPPA: a computational server for spatial epitope prediction of protein antigens. Nucleic Acids Res. 37, W612–616, doi: 10.1093/nar/gkp417 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhou C. et al. SEPPA 3.0-enhanced spatial epitope prediction enabling glycoprotein antigens. Nucleic Acids Res. 47, W388–W394, doi: 10.1093/nar/gkz413 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sweredoski MJ & Baldi P PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure. Bioinformatics 24, 1459–1460, doi: 10.1093/bioinformatics/btn199 (2008). [DOI] [PubMed] [Google Scholar]
14.Rubinstein ND, Mayrose I, Martz E & Pupko T Epitopia: a web-server for predicting B-cell epitopes. BMC Bioinformatics 10, 287, doi: 10.1186/1471-2105-10-287 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Kulkarni-Kale U, Bhosle S & Kolaskar AS CEP: a conformational epitope prediction server. Nucleic Acids Res. 33, W168–171, doi: 10.1093/nar/gki460 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Hopp TP & Woods KR Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl. Acad. Sci. USA 78, 3824–3828, doi: 10.1073/pnas.78.6.3824 (1981). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Jespersen MC, Peters B, Nielsen M & Marcatili P BepiPred-2.0: improving sequence-based B-cell epitope prediction using conformational epitopes. Nucl. Acids Res 45, W24–W29, doi: 10.1093/nar/gkx346 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Potocnakova L, Bhide M & Pulzova LB An Introduction to B-Cell Epitope Mapping and In Silico Epitope Prediction. J. Immun. Res 2016, 6760830, doi: 10.1155/2016/6760830 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Holmes MA, Buss TN & Foote J Conformational correction mechanisms aiding antigen recognition by a humanized antibody. J. Exp. Med 187, 479–485, doi: 10.1084/jem.187.4.479 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Li Y, Li H, Smith-Gill SJ & Mariuzza RA Three-dimensional structures of the free and antigen-bound Fab from monoclonal antilysozyme antibody HyHEL-63(,). Biochemistry 39, 6296–6309, doi: 10.1021/bi000054l (2000). [DOI] [PubMed] [Google Scholar]
21.Stanfield RL, Dooley H, Verdino P, Flajnik MF & Wilson IA Maturation of shark single-domain (IgNAR) antibodies: evidence for induced-fit binding. J. Mol. Biol 367, 358–372, doi: 10.1016/j.jmb.2006.12.045 (2007). [DOI] [PubMed] [Google Scholar]
22.Braden BC et al. Three-dimensional structures of the free and the antigen-complexed Fab from monoclonal anti-lysozyme antibody D44.1. J. Mol. Biol 243, 767–781, doi: 10.1016/0022-2836(94)90046-9 (1994). [DOI] [PubMed] [Google Scholar]
23.Halperin I, Ma B, Wolfson H & Nussinov R Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins 47, 409–443, doi: 10.1002/prot.10115 (2002). [DOI] [PubMed] [Google Scholar]
24.Comeau SR, Gatchell DW, Vajda S & Camacho CJ ClusPro: a fully automated algorithm for protein-protein docking. Nucl. Acids Res 32, W96–99, doi: 10.1093/nar/gkh354 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Comeau SR, Gatchell DW, Vajda S & Camacho CJ ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics 20, 45–50 (2004). [DOI] [PubMed] [Google Scholar]
26.Kozakov D. et al. The ClusPro web server for protein-protein docking. Nat. Prot 12, 255–278, doi: 10.1038/nprot.2016.169 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Kozakov D, Brenke R, Comeau SR & Vajda S PIPER: an FFT-based protein docking program with pairwise potentials. Proteins 65, 392–406, doi: 10.1002/prot.21117 (2006). [DOI] [PubMed] [Google Scholar]
28.Brenke R. et al. Application of asymmetric statistical potentials to antibody-protein docking. Bioinformatics 28, 2608–2614, doi: 10.1093/bioinformatics/bts493 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Guest JD et al. An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants. Structure 29, 606–621 e605, doi: 10.1016/j.str.2021.01.005 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Krawczyk K, Liu X, Baker T, Shi J & Deane CM Improving B-cell epitope prediction and its application to global antibody-antigen docking. Bioinformatics 30, 2288–2294, doi: 10.1093/bioinformatics/btu190 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Krawczyk K, Baker T, Shi J & Deane CM Antibody i-Patch prediction of the antibody binding site improves rigid local antibody-antigen docking. Protein Eng. Des. & Sel 26, 621–629, doi: 10.1093/protein/gzt043 (2013). [DOI] [PubMed] [Google Scholar]
32.Sikora M. et al. Computational epitope map of SARS-CoV-2 spike protein. PLoS Comput. Biol 17, e1008790, doi: 10.1371/journal.pcbi.1008790 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Marks C & Deane CM How repertoire data are changing antibody science. J. Biol. Chem 295, 9823–9837, doi: 10.1074/jbc.REV120.010181 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Vajda S, Porter KA & Kozakov D Progress toward improved understanding of antibody maturation. Curr. Opin. Struct. Biol 67, 226–231, doi: 10.1016/j.sbi.2020.11.008 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Porter KA et al. Template-Based Modeling by ClusPro in CASP13 and the Potential for Using Co-evolutionary Information in Docking. Proteins, doi: 10.1002/prot.25808 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Padhorny D. et al. Protein-protein docking by fast generalized Fourier transforms on 5D rotational manifolds. Proc. Natl. Acad. Sci. USA 113, E4286–4293, doi: 10.1073/pnas.1603929113 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Ngan CH et al. FTSite: high accuracy detection of ligand binding sites on unbound protein structures. Bioinformatics 28, 286–287, doi: 10.1093/bioinformatics/btr651 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Desta IT et al. Mapping of antibody epitopes based on docking and homology modeling. Proteins, doi: 10.1002/prot.26420 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Jumper J. et al. Applying and improving AlphaFold at CASP14. Proteins, doi: 10.1002/prot.26257 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Jumper J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589, doi: 10.1038/s41586-021-03819-2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Tunyasuvunakool K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596, doi: 10.1038/s41586-021-03828-1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Evans R. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv (2021). [Google Scholar]
43.Ghani U. et al. Improved Docking of Protein Models by a Combination of Alphafold2 and ClusPro. bioRxiv (2021). [Google Scholar]
44.Ko J & Lee J Can AlphaFold2 predict protein-peptide complex structures accurately? bioRxiv (2021). [Google Scholar]
45.Mirdita M, Ovchinnikov S & Steinegger M ColabFold - Making protein folding accessible to all. bioRxiv (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Desta IT, Porter KA, Xia B, Kozakov D & Vajda S Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure 28, 1071–1081 e1073, doi: 10.1016/j.str.2020.06.006 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Webb B & Sali A Comparative Protein Structure Modeling Using MODELLER. Curr. Prot. Prot. Sci 86, 2 9 1–2 9 37, doi: 10.1002/cpps.20 (2016). [DOI] [PubMed] [Google Scholar]
48.Katchalski-Katzir E. et al. Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. Proc. Natl. Acad. Sci. USA 89, 2195–2199 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Lindemann SR, Yershova A & LaValle SM in Algorithmic Foundations of Robotics VI (eds Erdmann Michael, Overmars Mark, Hsu David, & van der Stappen Frank) 313–328 (Springer; Berlin Heidelberg, 2005). [Google Scholar]
50.Chuang GY, Kozakov D, Brenke R, Comeau SR & Vajda S DARS (Decoys As the Reference State) potentials for protein-protein docking. Biophys J 95, 4217–4227, doi: 10.1529/biophysj.108.135814 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Lee B & Richards FM The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol 55, 379–400, doi: 10.1016/0022-2836(71)90324-x (1971). [DOI] [PubMed] [Google Scholar]
52.Vreven T. et al. Updates to the Integrated Protein-Protein Interaction Benchmarks: Docking Benchmark Version 5 and Affinity Benchmark Version 2. J. Mol. Biol 427, 3031–3041, doi: 10.1016/j.jmb.2015.07.016 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Fox NK, Brenner SE & Chandonia JM SCOPe: Structural Classification of Proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Research 42, D304–309, doi: 10.1093/nar/gkt1240 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Akbar R. et al. A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding. Cell. Rep 34, 108856, doi: 10.1016/j.celrep.2021.108856 (2021). [DOI] [PubMed] [Google Scholar]
55.Salamanca Viloria J, Allega MF, Lambrughi M & Papaleo E An optimal distance cutoff for contact-based Protein Structure Networks using side-chain centers of mass. Sci. Rep 7, 2838, doi: 10.1038/s41598-017-01498-6 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Stave JW & Lindpaintner K Antibody and antigen contact residues define epitope and paratope size and structure. J. Immunol 191, 1428–1435, doi: 10.4049/jimmunol.1203198 (2013). [DOI] [PubMed] [Google Scholar]
57.Pittala S & Bailey-Kellogg C Learning context-aware structural representations to predict antigen and antibody binding interfaces. Bioinformatics 36, 3996–4003, doi: 10.1093/bioinformatics/btaa263 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Sivasubramanian A, Sircar A, Chaudhury S & Gray JJ Toward high-resolution homology modeling of antibody Fv regions and application to antibody-antigen docking. Proteins 74, 497–514, doi: 10.1002/prot.22309 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Porter KA et al. Template-based modeling by ClusPro in CASP13 and the potential for using co-evolutionary information in docking. Proteins 87, 1241–1248, doi: 10.1002/prot.25808 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Padhorny D. et al. ClusPro in rounds 38 to 45 of CAPRI: Toward combining template-based methods with free docking. Proteins 88, 1082–1090, doi: 10.1002/prot.25887 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Weitzner BD et al. Modeling and docking of antibody structures with Rosetta. Nat. Prot 12, 401–416, doi: 10.1038/nprot.2016.180 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Lepore R, Olimpieri PP, Messih MA & Tramontano A PIGSPro: prediction of immunoGlobulin structures v2. Nucleic Acids Res. 45, W17–W23, doi: 10.1093/nar/gkx334 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Klausen MS, Anderson MV, Jespersen MC, Nielsen M & Marcatili P LYRA, a webserver for lymphocyte receptor structural modeling. Nucleic Acids Res. 43, W349–355, doi: 10.1093/nar/gkv535 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Schritt D. et al. Repertoire Builder: high-throughput structural modeling of B and $T$ cell receptors. Mol. Syst. Des. Eng 4, 761–768, doi: 10.1039/c9me00020h (2019). [DOI] [Google Scholar]
65.Karami Y. et al. DaReUS-Loop: a web server to model multiple loops in homology models. Nucleic Acids Res. 47, W423–W428, doi: 10.1093/nar/gkz403 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Dunbar J. et al. SAbPred: a structure-based antibody prediction server. Nucleic Acids Res. 44, W474–478, doi: 10.1093/nar/gkw361 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Marks C & Deane CM Antibody H3 Structure Prediction. Comput. Struct. Biotechnol. J 15, 222–231, doi: 10.1016/j.csbj.2017.01.010 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Lensink MF et al. Blind prediction of homo- and hetero-protein complexes: The CASP13-CAPRI experiment. Proteins 87, 1200–1221, doi: 10.1002/prot.25838 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Ruffolo JA, Guerra C, Mahajan SP, Sulam J & Gray JJ Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics 36, i268–i275, doi: 10.1093/bioinformatics/btaa457 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Jespersen MC, Mahajan S, Peters B, Nielsen M & Marcatili P Antibody Specific B-Cell Epitope Predictions: Leveraging Information From Antibody-Antigen Protein Complexes. Front. Immunol 10, 298, doi: 10.3389/fimmu.2019.00298 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Antonyuk SV et al. Crystal structure of human prion protein bound to a therapeutic antibody. Proc. Natl. Acad. Sci. U S A 106, 2554–2558, doi: 10.1073/pnas.0809170106 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Maun HR et al. Hedgehog pathway antagonist 5E1 binds hedgehog at the pseudo-active site. J. Biol. Chem 285, 26570–26580, doi: 10.1074/jbc.M110.112284 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement

Supplementary Figure 2 ∣ Impact of the temperature factor $T$ and the number of decoys on the ROC AUC score for the 40 antibody-antigen complexes in the benchmark set BM5.

Supplementary Table 2 ∣ True positives when considering the top 20 ranking residues for the 40 cases in the BM5 set.

Supplementary Table 3 ∣ ROC AUC comparison of AbEMap with SEPPA and BEpro.

NIHMS1965875-supplement-Supplement.pdf^{(256.2KB, pdf)}

NIHMS1965875-supplement-2.pdf^{(125KB, pdf)}

Data Availability Statement

[R73] Desta IT et al. Proteins. (2022) DOI: 10.1002/prot.26420 [DOI] [Google Scholar]

[R74] Porter KA et al. Proteins. 87, 1241–1248 (2019): DOI: 10.1002/prot.25808 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R75] Kozakov D. et al. Nat. Protoc 12, 255–278 (2017): DOI: 10.1038/nprot.2016.169 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R76] Padhorny D. et al. PNAS. 113, E4286–4293 (2016): DOI: 10.1073/pnas.1603929113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R77] Ngan CH et al. Bioinformatics. 28, 286–287 (2012): DOI: 10.1093/bioinformatics/btr651 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

The ClusPro AbEMap web server for the prediction of antibody epitopes

Israel T Desta

Sergei Kotelnikov

George Jones

Usman Ghani

Mikhail Abyzov

Yaroslav Kholodov

Daron M Standley

Dmitri Beglov

Sandor Vajda

Dima Kozakov

Abstract

EDITORIAL SUMMARY:

TWEET:

Teaser:

INTRODUCTION

The AbEMap algorithm and server overview

Figure 1 ∣.

Protein datasets used for testing AbEMap

Figure 3 ∣.

Performance measures

Applications of the method

Epitope mapping starting from an antibody X-ray structure.

Figure 2 ∣.

Epitope mapping starting from modeled antibody structures (provided by the user).

TABLE 1 ∣.

Epitope mapping starting from antibody sequences.

Comparison with existing methods

Figure 4 ∣.

Limitations

MATERIALS

EQUIPMENT

PROCEDURE

Entering the basic inputs – TIMING ~1-2 min

Box 1 ∣. Status updates for AbEMap runs.

Processing pdb files.

Pre-docking minimization.

Copying to supercomputer.

Held on supercomputer.

In queue on supercomputer.

Running on supercomputer.

Calculating epitope residues.

Finalizing job.

Copying to local computer.

Finished.

Error on local system.

Error on supercomputer.

Box 2 ∣. Error messages and their meanings.

XXXX not found in PDB.

Unknown residue XXX in antigen.

Unknown residue XXX in antibody.

Antigen chains must be fewer than 20 characters.

Antigen chains must be white space separated alphanumeric characters.

Antigen PDB ID must be 4 alphanumeric characters.

Antigen file too large.

Antigen file only partially uploaded.

Antibody chains must be fewer than 20 characters.

Antibody chains must be white space separated alphanumeric characters.

Antibody PDB ID must be 4 alphanumeric characters.

Antibody file too large.

Antibody file only partially uploaded.

Copy of antigen failed.

Copy of antibody failed.

Processing failed on antigen.

Processing failed on antibody.

Job ran out of memory on server.

Repulsion must be in whitespace separated chain-residue format.

Figure 5 ∣.

(A). Import the antigen structure using PDB ID

(B). Upload the antigen structure using a PDB format file

(A). FASTA format sequence-based procedure

(B). PDB format structure-based procedure

Figure 6 ∣.

(A). Automatically mask non-CDRs

(B). Define repulsion

Submitting the job and obtaining the results – TIMING ~45 – 90 minutes

Figure 7 ∣.

Analyzing the results – TIMING ~30-40 min

Box 3 ∣. Results for the different types of inputs.