Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Jun 15;17(6):e0258173. doi: 10.1371/journal.pone.0258173

Simultaneous prediction of antibody backbone and side-chain conformations with deep learning

Deniz Akpinaroglu 1, Jeffrey A Ruffolo 2, Sai Pooja Mahajan 3, Jeffrey J Gray 2,3,*
Editor: Alexey Porollo4
PMCID: PMC9200299  PMID: 35704640

Abstract

Antibody engineering is becoming increasingly popular in medicine for the development of diagnostics and immunotherapies. Antibody function relies largely on the recognition and binding of antigenic epitopes via the loops in the complementarity determining regions. Hence, accurate high-resolution modeling of these loops is essential for effective antibody engineering and design. Deep learning methods have previously been shown to effectively predict antibody backbone structures described as a set of inter-residue distances and orientations. However, antigen binding is also dependent on the specific conformations of surface side-chains. To address this shortcoming, we created DeepSCAb: a deep learning method that predicts inter-residue geometries as well as side-chain dihedrals of the antibody variable fragment. The network requires only sequence as input, rendering it particularly useful for antibodies without any known backbone conformations. Rotamer predictions use an interpretable self-attention layer, which learns to identify structurally conserved anchor positions across several species. We evaluate the performance of the model for discriminating near-native structures from sets of decoys and find that DeepSCAb outperforms similar methods lacking side-chain context. When compared to alternative rotamer repacking methods, which require an input backbone structure, DeepSCAb predicts side-chain conformations competitively. Our findings suggest that DeepSCAb improves antibody structure prediction with accurate side-chain modeling and is adaptable to applications in docking of antibody-antigen complexes and design of new therapeutic antibody sequences.

Introduction

Antibodies are specialized proteins that play a crucial role in the detection and destruction of pathogens. The binding and specificity of antibodies are largely determined by the complementarity determining regions (CDRs) that consist of three loops in the light chain and three loops in the heavy chain [1]. Structural diversity is largely achieved by the third loop in the heavy chain, which determines many antigen binding properties. Additionally, CDR H3 does not have a canonical fold like the other loops [2], making it challenging to model [35]. Currently, engineering of new antibodies is hindered by accurate prediction of the CDR H3 loop, including the corresponding side-chains for docking applications. Prediction of side-chains is a critical component of structure prediction and protein design [6], as the surface of the antibody CDR loops including the side-chains play an important role in antigen recognition [7].

There has been a growing interest in effective design of new antibodies since they are commonly used in biotherapeutics [8]. Antibody structure determination via techniques like X-ray crystallography and NMR is challenging and time-consuming. Machine learning methods improve overall structure prediction and docking [9]. Recently, highly accurate structure prediction models have been proposed for proteins in general [1012] and for antibodies [1316]. The performance of AlphaFold2 has been impressively accurate in the recent CASP14 experiment and surpassed most other protein structure prediction methods proposed to date [11]. Unlike the other deep learning-based methods, AlphaFold2 predicts all side-chain rotamers in addition to the protein backbone. While current antibody prediction methods utilizing deep learning do not directly predict side-chains, they are all able to predict the backbones with high accuracy. Hence, a next step towards the advancement of antibody modeling and engineering is the accurate prediction of side-chains to improve overall structure prediction and docking.

Presently, there are successful methods for rotamer predictions that rely on calculating the probability of a χ angle as a function of backbone torsion angles. For instance, SCWRL4 uses backbone-dependent libraries to calculate rotamer frequencies based on kernel density estimates and kernel regressions [17]. The Rosetta suite employs a similar strategy to repack rotamers [18]. Antibody-specific methods like PEARS capture rotameric preferences based on the immunogenetics numbering scheme to restrict possible side-chain conformations in the sample space based on positional information [19]. Both SCWRL4 and PEARS require the antibody sequence and backbone structure to generate side-chain predictions. They repack the side-chains onto the provided backbone, and their performance generally declines when the input is not the crystal backbone. To address these limitations, we propose DeepSCAb (deep side-chain antibody), a deep neural network that predicts full FV structures, including side-chain conformations from only the amino acid sequence.

Methods

Antibody structure datasets

Training dataset

We used the Structural Antibody Database [20], SAbDab, to curate the training dataset for DeepSCAb. To ensure only high-quality examples were used for training, we limited the dataset to structures with 3 Å resolution or better. To assess the impacts of the sequence redundancy threshold on antibody sequence diversity, we collected structures filtered at a range of sequence identity cutoffs (60%, 70%, 80%, 90%, 95%, 99%), as well as an unfiltered set of structures. For each set of structures, we calculated the entropy of the amino acid distribution for each position according to the Chothia numbering (S1 and S2 Figs in S1 File). As expected, we observed a general loss of positional diversity (lower entropy) with increasing sequence redundancy. However, we observed the opposite trend for the residues belonging to the CDR H3 loop, with less stringent cutoffs allowing for greater sequence diversity. With this in mind, we selected the 99% sequence identity dataset for model training. For PDBs with multiple structures (such as crystals with multiple instances of the FV in the unit cell), we always select the first. We additionally removed targets belonging to the RosettaAntibody benchmark set [21] to evaluate model performance, resulting in a total of 1,433 antibody structures (64% bound and 36% unbound) for training and validation of our network. Of these structures, a random 95% were used for training and the remaining 5% were used for validation.

Predicting antibody structure from sequence

DeepSCAb consists of two main components: an inter-residue module for predicting backbone geometries and a rotamer module for predicting side-chain dihedrals. The inter-residue module is initially trained separately and then in parallel with the rotamer module.

Simultaneous prediction of side-chain and backbone geometries

The initial layers of the model for predicting pairwise distances and orientations are based on a network architecture similar to that of DeepH3 [13]. The inter-residue module consists of a 3-block 1D ResNet and a 25-block 2D ResNet. As input to the model, we provide the concatenated heavy and light chain FV sequences, with a total length L. The input amino acid sequence is one-hot encoded, resulting in a dimension L × 20. We append an additional binary chain-break delimiter, dimension L × 1, to the input encoding to mark the last residue of the heavy chain. Taken together, the full model input has dimension L × 21. The 1D ResNet begins with a 1D convolution that projects the input features up to L × 64, followed by three 1D ResNet blocks (two 1D convolution with a kernel size of 17) that maintain dimensionality. The output of the 1D ResNet is then transformed to pairwise by redundantly expanding the L × 32 tensor to an L × L × 64 tensor. Next, this tensor passes through 25 blocks in the 2D ResNet that maintain dimensionality with two 2D convolutions and kernel size of 5 × 5. The resulting tensors are converted to pairwise probability distributions over Cβ distance, d, the orientation dihedrals ω and θ, and the planar angle ϕ. The inter-residue module is trained as described for DeepH3 [13].

The rotamer module takes as input the inter-residue features. The tensors of dimension L × L × 64 resulting from the 2D ResNet are transformed to sequential by stacking of rows and columns to obtain a final dimension of L × 128. The rotamer module contains a multi-head attention layer of 1 block with 8 parallel attention heads and a feedforward dimension of 512. The self-attention layer outputs L × 128 tensors, which then pass through a 1D convolution with kernel size of 5. The tensors are converted to rotamer probability distributions that are conditionally predicted for each χ dihedral using softmax. For example, χ1 is an input to χ2, χ1 and χ2 are inputs to χ3, χ1 through χ3 are inputs to χ4, and χ1 through χ4 are inputs to χ5. The predicted rotamers are added back into the inter-residue module: the rotamer tensors are stacked onto the pairwise before the final 2D convolution to update the d, ω, θ, and ϕ outputs.

Distances are discretized into 36 equal-sized bins in the range of 0 to 18 Å. All dihedral outputs of the network are discretized into 36 equal-sized bins in the range of -180°to 180°with the exception of χ1. The χ1 dihedral is discretized into 36 non-uniform bins, with 6 bins of 30°and 30 bins of 6°. The small bins are centered around -60°, 60°, and 180°, consistent with observed conformational isomers. The planar angle ϕ is discretized into 36 equal-sized bins with range 0 to 180°. Pairwise dihedrals are not calculated for glycine residues due to the absence of a Cβ atom. side-chain dihedrals were not calculated for glycine and alanine residues due to the absence of a Cγ atom and for proline residues due to its non-rotameric nature.

Categorical cross-entropy loss is calculated for each output, where the pairwise losses are summed with equal weight and the rotamer losses are scaled based on each dihedral’s frequency of observation: i.e., χ5 rotamers are much less frequent than χ1. We do not calculate losses for residue pairs and rotamers missing any of their constitutive atoms, as can occur for poorly resolved flexible regions. The Adam optimizer is used with a learning rate of 0.001. We trained five models with a batch size of 1 on random 95/5 training/validation splits and averaged over model predictions to generate potentials for downstream applications. DeepSCAb models were trained on one NVIDIA K80 GPU, which required approximately 100 hours for 120 epochs of training.

Side-chain only predictions

To investigate the effect of inter-residual predictions on rotameric predictions, we designed a side-chain-only network as a control. The control network takes as input the one-hot encoded antibody sequence, which passes through a 3 block 1D ResNet. The remaining architecture of the control network as well as its training process is similar to the rotamer module of DeepSCAb (S3 Fig in S1 File). However, there are differences in dimension due to the 1D ResNet returning an L × 32 tensor. The control network models were trained on one NVIDIA K80 GPU, which required 10 hours for 20 epochs of training. We adopted a shorter training process for the control network as the models tended to overfit after 20 epochs.

Self-attention implementation and interpretation

Transformer encoder attention layer

The rotamer module contains a transformer encoder layer that adds the capacity to aggregate information over the entire sequence (S4 Fig in S1 File). We tuned the number of parallel attention heads, the feedforward dimension, and the number of blocks according to validation loss during training. We found that 8 attention heads outperformed 16, feedforward with a dimension of 512 outperformed 1,024 and 2,048, and one block of attention performed identically to two. We further experimented with adding a sinusoidal positional embedding prior to the self-attention layer and obtained identical results, implying that the convolutions in our network contain sufficient information on the order of input elements, rendering positional encoding unnecessary [22].

Interpreting the attention layer

In our interpretation of the rotamer attention, we take into consideration only one model out of the five that were trained on random training/validation splits. We do not report an average over the attention matrices from multiple models since they vary amongst themselves (S5 Fig in S1 File). Nevertheless, the properties of attention are conserved across individual models.

We utilize a selected subset of the independent test set to display the most variation across the highly-attended positions as well as the corresponding residue types. This subset consists of the following human PDBs: 1JFQ, 1MFA, 2VXV, 3E8U, 3GIZ, 3HC4, 3LIZ, 3MXW, 3OZ9, and 4NZU.

Modeling side-chains with DeepSCAb in Rosetta

DeepSCAb generates constraints that are utilized for the prediction of an antibody structure. Discrete potentials are converted to continuous function via the built-in Rosetta spline function. The constraints include all 9 geometries, namely d, ω, θ, ϕ, χ1, χ2, χ3, χ4, and χ5. The ConstraintSetMover in Rosetta applies these constraints onto the native pose and then the PackRotamersMover models side-chain structures. We use the default Dunbrack rotamer library [18] and allow the PackRotamersMover to sample extended ranges for χ1 and χ2 (using the “-ex1” and “-ex2” flags), as this has been shown to improve side-chain packing performance. We chose the standard ref2015 full-atom score function with a weight of 1.0 for all constraints. This protocol can repack side-chains on any backbone structure with DeepSCAb predictions.

Side-chain predictions using alternative methods

To assess the side-chain prediction accuracy against relative solvent accessible surface area, we compare DeepSCAb to three alternative methods: PEARS, SCWRL4, and Rosetta. For each alternative method, we provide the backbones of the benchmark targets and their sequences. PEARS utilizes antibody-specific rotamer libraries and assigns rotamers based on the IMGT numbering scheme. We generated predictions from PEARS using the publicly available server [19]. SCWRL4 generates χ kernel density estimates based on backbone-dependent rotamer libraries by minimizing the conformational energies for each residue. We generated SCWRL4 predictions using the SCWRL4.0 algorithm [17]. Rosetta predictions were generated using the same protocol as DeepSCAb, but with only the ref2015 energy function and no learned constraints.

Data and code availability

The structures used to train the models presented in this work were collected from SAbDab [20] (http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab), which curates antibody structures from the Protein Data Bank [23] (https://www.rcsb.org). The source code to train and run DeepSCAb, as well as pretrained models, are available at https://github.com/Graylab/DeepSCAb. The structures predicted by DeepSCAb and alternative methods for benchmarking have been deposited at Zenodo: 10.5281/zenodo.6371490.

Results

Overview of the method

Our deep learning method for antibody structure prediction consists of inter-residue and rotamer modules. We trained DeepSCAb to predict antibody backbones as inter-residual distance and orientations. Then, we simultaneously trained the model to predict the side-chain conformations using an attention layer. The pairwise and rotamer probability distributions from DeepSCAb predictions were used for structure realization and packing of the side-chains using Rosetta.

DeepSCAb predicts inter-residue and side-chain orientations from sequence

DeepSCAb is a neural network that only requires an antibody sequence to predict full FV structures including side-chain geometries (Fig 1A). The combined sequences of the antibody heavy and light chains are inputted as a one-hot encoding to initially pass through the inter-residue module. The architecture of this module is similar to our previous method for CDR H3 loop structure prediction, DeepH3 (Fig 1B) [13]. The network is pretrained to predict pairwise geometries, such as d, ω, θ, and ϕ. We then feed the penultimate outputs into the rotamer module for the prediction of side-chain conformations, represented as torsion angles χ1, χ2, χ3, χ4, and χ5. We explored two strategies for side-chain torsion angle prediction: simultaneous prediction of all angles and conditional prediction of successive torsion angles. For conditional prediction of torsion angles, every χ angle after χ1 is predicted given the preceding χ angle(s) (Fig 1A). After training both model variants, we selected the conditional model as it resulted in lower cross-entropy losses for all nine outputs (inter-residue and rotameric). Within the rotamer module, we included an interpretable self-attention layer before predicting the torsion angles. The predicted side-chain distributions are used to update the inter-residue module to obtain the final inter-residue outputs. We then used Rosetta for full FV structure realization as well as the repacking of side chains guided by DeepSCAb rotamer constraints (Fig 1C).

Fig 1. Overview of the DeepSCAb network architecture.

Fig 1

(A) Conditional side-chain dihedral prediction in DeepSCAb rotamer module with each dihedral after χ1 depending on previous prediction(s). (B) DeepSCAb architecture for predicting inter-residue geometries and side-chain dihedrals. (C) Applications of DeepSCAb for full FV realizations and side-chain repacking using Rosetta.

Rotamer module attends to structurally-conserved anchor positions

The rotamer module includes a self-attention layer that allows us to identify the positions that most significantly influence the side-chain predictions. Rather than attending broadly across the entire antibody sequence, we observed that the model restricted attention to structurally-conserved residues, which we refer to as anchors.

We tested the conservation of anchor positions in various species and settings including ten human antibody targets, a bovine antibody, and mouse and rat sequences with unknown structures. We collected the human antibodies from the independent test set and selected a random bovine antibody (6E9G). Lastly, the mouse and rat antibody structures shown are predictions from DeepSCAb using the protocol described for DeepAb [14], for random paired sequences from OAS [24]. Across the aforementioned range of systems, we found that the anchor positions, as well as anchor residue types, are frequently conserved (Fig 2A).

Fig 2. Identification of anchor residue positions from rotamer module attention.

Fig 2

Rotamer module attention is interpreted to indicate positional significance in side-chain predictions. (A) An attention spectrum (left) ranging from white to magenta represents 0% to 100% attention, respectively. Human, bovine, mouse, and rat antibodies are shown. (B) The variation in attention level is shown with increasing training progress. The epochs represented are 5, 20, 40, 60, 80, 100, and 120.

When we analyzed how attention patterns changed throughout the course of training we observed a process that resembled a search for anchor residues. The initial epochs (before epoch 25) in training are used as a means of scanning for key positions in the sequence. This subsequently results in switching out a few anchors altogether in the beginning stages of training. The high attention residues begin to settle in their positions at epoch 40, however, the ranges of attention assigned remain dynamic up until late epochs. At epoch 100, the model settles on eight anchor positions commonly with highest levels of attention observed (Fig 2B).

Side-chain predictions improve CDR H3 loop structure accuracy

Training on backbone geometries improves side-chain predictions

To assess the side-chain prediction accuracy of the model without any knowledge of backbone preferences, we designed a control network that consists primarily of the DeepSCAb rotamer module. Although the design of our control network would seem to violate the hierarchy of protein structure, in which the local tertiary environment is a critical determinant of side-chain conformation, we sought to investigate the capacity of a ResNet model to infer side-chain conformations from sequence position alone. This control is similar in principle to the PEARS method [19], which uses positional statistics collected for IMGT-numbered positions to predict side-chain conformations.

We evaluated the control network and the full DeepSCAb on a decoy discrimination task using a set of structures generated by Jeliazkov et al. [25], for the RosettaAntibody benchmark with 2,800 decoys per target. In the decoy discrimination task, we evaluate the ability of an energy function, such as the rotameric distributions predicted by DeepSCAb, to distinguish near-native conformations from a large set of alternative conformations (decoys). For each target in the benchmark, we score each of the decoys using the control network and DeepSCAb, and compare the decoy ranking capacity of the models by measuring the RMSD from the native for the top-1 and top-5 scoring structures. For the top-1 scoring structures, DeepSCAb (RMSD = 3.2 Å) outperformed the control network (RMSD = 5.0 Å) by 1.8 Å (32 better, 7 same, 10 worse). Among the top-5 scoring structures, DeepSCAb (RMSD = 2.5 Å) outperformed the control network (RMSD = 3.3 Å) by 0.8 Å (23 better, 11 same, 15 worse) (Table 1). Due to the considerable improvement observed in DeepSCAb over the control network, we conclude that direct injection of structural priors, through prediction of the inter-residue geometries, is beneficial for antibody side-chain predictions.

Table 1. Decoy discrimination compared to DeepSCAb.
Energy Top 1-Scoring Decoys Top 5-Scoring Decoys
Better Same Worse <RMSD> Better Same Worse <RMSD>
DeepSCAb - - - 3.2Å ± 1.3 - - - 2.5Å ± 1.2
Control 32 7 10 5.0Å ± 3.8 23 11 15 3.3Å ± 2.4
DeepH3 10 33 7 3.2Å ± 1.4 10 35 4 2.6Å ± 1.3

Training on side-chain geometries improves inter-residue predictions in return

Since DeepSCAb outperformed the side-chain-only control network, we next evaluated the impacts of learning side-chain conformations on pairwise residue-residue geometry predictions. First, we compared the cross-entropy loss achieved by DeepSCAb to that of DeepH3 for the trained ensembles (S6 Fig in S1 File). For every pairwise geometry prediction, DeepSCAb achieved lower loss than DeepH3 for both the training and validation datasets, suggesting that side-chain prediction can improve prediction of inter-residue geometries. Given this improvement, we next compared the performance of DeepSCAb to DeepH3 on the decoy discrimination task. For the Top 1-scoring decoys, DeepSCAb modestly outperformed DeepH3 (10 better, 33 same, 7 worse; <ΔRMSD> = 0 Å). For the Top 5-scoring decoys, DeepSCAb outperformed DeepH3 (10 better, 35 same, 4 worse; <ΔRMSD> = −0.1 Å) (Table 1).

Using the independent test set, we plotted the structures chosen by DeepH3 against the ones chosen by DeepSCAb based on their RMSD (Å) (Fig 3). DeepSCAb was better at distinguishing near-native structures in both Top 1 and Top 5 plots, though improvements over DeepH3 were most notable in the Top 5 comparison (Fig 3A). We then analyzed two targets chosen from the Top 5 decoys for the three methods and ref2015 (Rosetta energy). We show the 2,500 structure scores against RMSD in the CDR H3 loop for the target 2FB4 (loop length of 19) (Fig 3B). DeepSCAb outperformed the control network (ΔRMSD = −10.8 Å), ref2015 (ΔRMSD = −10.2 Å), and DeepH3 (ΔRMSD = −1.2 Å). Comparison of the structures identified by DeepSCAb and DeepH3 revealed that both models are able to place the CDR H3 loop in the correct orientation, however, the addition of side-chain information in DeepSCAb results in a more accurate structure (Fig 3C). We further plotted the funnel energies in the CDR H3 loop for the target 3MLR (loop length of 17) (Fig 3D). DeepSCAb outperformed the control network (ΔRMSD = −4.4 Å), ref2015 (ΔRMSD = −5.8 Å), and DeepH3 (ΔRMSD = −1.4 Å). We superimposed predicted structures from each method and the native for the target 3MLR, one of the longer and more difficult of the CDR H3 loops (Fig 3E). DeepSCAb predicts the loop structure with the highest accuracy. Hence, the addition of side-chain orientations is beneficial for accurately predicting pairwise geometries.

Fig 3. Comparison of CDR H3 structure prediction accuracy.

Fig 3

The accuracy with which DeepSCAb, DeepH3, and the control network predict the CDR H3 loop structure is measured via decoy structure scoring tasks. (A) In Top 1-scoring decoy structures (top) and Top 5-scoring structures (bottom), the performance of DeepSCAb is compared to DeepH3 using the test set. (B) The CDR H3 energies for the three methods and Rosetta (ref2015) that correspond to 2FB4 are plotted against their RMSD. The five best scoring structures for each plot are indicated in red. (C) The best prediction from Top 5-scoring decoys for target 2FB4 are shown for DeepSCAb (green, 2.34 Å RMSD), DeepH3 (blue, 3.535 Å RMSD), and the control network (purple, 13.091 Å RMSD) all compared to the native (orange). (D) The CDR H3 energies for the three methods and Rosetta (ref2015) that correspond to 3MLR are plotted against their RMSD. The five best scoring structures for each plot are indicated in red. (E) The best prediction from Top 5-scoring decoys for target 3MLR are shown for DeepSCAb (green, 2.998 Å RMSD), DeepH3 (blue, 4.432 Å RMSD), and the control network (purple, 7.441 Å RMSD) all compared to the native (orange).

DeepSCAb is competitive with alternative rotamer packing methods

The context of the predicted side chains is crucial in determining the accuracy and usefulness of a method. Side chains that are exposed to a solvent play an active role in the binding of an antigen, yet are also inherently the most flexible. We evaluated the performance of our method and three alternative methods as a function of relative side-chain solvent accessible surface area (SC SASA) using the Rosetta rel_per_res_sc_sasa method, normalizing using reference SASAs from Tien et al. [26]. We compared the success of DeepSCAb in predicting side-chain conformations to PEARS, SCWRL4, and Rosetta, using the native structure as a reference for all measurements. We omitted the target 3MLR from side-chain packing and relative SC SASA comparisons as PEARS was unable to model this structure due to its long L3 loop.

The exposure of side-chains to solvent (SC SASA) is a key determinant of whether a computational method can be expected to accurately recover the native side-chain conformation. In Fig 4A, we compared the repacking performance of DeepSCAb to alternatives and found that DeepSCAb produced competitive side-chain packing results for buried residues, or a relative SC SASA of 0, and across a range of increasing solvent exposures (S1 Table in S1 File). With increasing solvent exposure, we see a consistent degradation of performance for all methods. This is expected, as the side chains gain additional conformational freedom with increasing solvent exposure, making accurate predictions increasingly challenging.

Fig 4. Impacts of solvent exposure and learned backbone error on side-chain prediction accuracy.

Fig 4

(A) Comparison of repacked side-chain RMSD for PEARS, SCWRL, Rosetta, and DeepSCAb with increasing relative side-chain solvent accessible surface area (SC SASA). (B) Comparison of error in DeepSCAb-predicted backbones versus error in side-chain dihedral prediction. Backbone error is measured as the Cβ deviation between the predicted structure and the native when the framework residues are aligned. Side-chain dihedral error is measured as a cosine distance between the predicted and native dihedrals.

A key distinction between DeepSCAb and alternative methods is that its learned side-chain potentials depend only on the antibody sequence. As a result, the predicted rotameric distributions are based on an implicit backbone learned by the inter-residue module. When this implicit model is incorrect, we expected the side-chain predictions to be less accurate as well. To test this hypothesis, we generated backbones from the DeepSCAb pairwise predictions using the structure realization procedure proposed for DeepAb [14]. Then, we quantify the error in this DeepSCAb backbone as the deviation from native of the Cβ atoms when the framework residues are aligned. After packing side-chains for these predicted backbones, we measure the cosine distance from the native dihedral for χ1-χ4 (χ5 is omitted due to limited data). We compare the backbone error to side-chain dihedral error and find that as the DeepSCAb-predicted backbone becomes less accurate (higher Cβ deviation), the side-chain dihedral errors increase (Fig 4B).

Discussion

The results show that our method is a step towards accurate antibody structure prediction via inclusion of side-chain conformations. We demonstrated that DeepSCAb predictions remain competitively accurate at varying side-chain surface exposure. In investigating the causes of failed side-chain predictions, we found that DeepSCAb rotamer module performance is dependent on the quality of its inter-residue geometry predictions. Thus, as methods for protein backbone prediction (and simultaneous side-chain prediction, as with AlphaFold2 [11]) continue to improve, it will be less important to predict side-chains separately. In the meantime, our method complements existing methods for antibody structure prediction.

Using the rotamer module attention, we are able to identify the residue types and positions that are the most influential in the context of side-chain predictions. While this analysis provides insight into rotamer prediction, it does not reveal local biophysical interactions that could be tied into the fundamentals of side-chain conformation in complex energy landscapes. The anchor sites are consistently scattered throughout the sequence, in stark contrast with the local chemical environment typically considered by most side-chain placement algorithms. Perhaps DeepSCAb is learning an internal, structurally conserved numbering scheme as a reference for side-chain prediction similar to the well-performing PEARS algorithm [19], which was designed to use antibody-specific residue positioning to condition rotamer predictions. Alternatively, the model could be identifying global antibody features such as germline class or species, which have some side-chain conformations conserved.

It is well-established that access to backbone context improves side-chain predictions [18], which is supported by our results. Additionally, we show that inclusion of side-chains enables structure prediction models to more effectively predict pairwise geometries (i.e., lower loss). We found that informing the model of rotameric outputs improved the ability of our model to discriminate near-native CDR H3 loop structures. As this improvement is limited, rather than a significant overall improvement in predictions, we believe that the model is reducing its loss by more confidently predicting pairwise geometries that were already correct. Thus going forward we must consider an implementation of side chain learning that is tailored to pairwise geometries that the model is unable to predict correctly by itself. Concurrent with this work, improved methods for antibody structure prediction have been developed: DeepAb [14] uses a similar architecture to predict inter-residue geometries, and ABlooper [15] predicts CDR loop coordinates directly. Our work suggests that both of these methods might be improved by incorporating side-chain context into predictions.

Most side-chain repacking methods sample conformations based on a backbone-dependent rotamer library [17], and the accurate PEARS method for antibody side-chain repacking estimates χ angle densities based on a position-dependent rotamer library [19]. Since our method does not require structure as an input, DeepSCAb should be more robust to changes in backbone structure for cases where the model’s implicit backbone (e.g., the inter-residue predictions) is close to correct. This feature is useful when there are multiple potential backbone conformations of interest [27], e.g., for the design of new therapeutic antibodies. Another deep learning method that conditionally samples rotamers has been proposed for protein sequence design by Anand et al. [28], which predicts rotamers given the native residue type for the fixed backbone. While rotamer prediction accuracy is higher with the availability of the native backbone structure, the ability to predict in its absence renders DeepSCAb uniquely useful. With minimal modification, our network can aid antibody design. For instance, DeepSCAb can be used in parallel with RosettaAntibodyDesign [29] for rapid placement of side-chains or to hallucinate new antibody sequences using the trRosetta architecture [10].

Conclusion

In this study, we investigated the effect of inter-residual predictions on the accuracy of side-chain dihedrals as well as the effect of rotamer predictions on the overall antibody structure prediction accuracy. We found that DeepSCAb competitively predicts rotamers when compared to alternative methods that require true backbone coordinates. The performance of our method is robust to when the backbone is perturbed or deviates from the crystal structure. Since DeepSCAb predicts a probability distribution over the backbone and side-chain geometries, we expect it will be adaptable to and useful for designing new antibodies.

Supporting information

S1 File. Supporting information for manuscript.

File containing containing supporting tables and figures for DeepSCAb method for prediction of antibody side-chain conformations.

(PDF)

Acknowledgments

We thank the Gray Lab for helpful discussions and advice. Dr. Gray is an unpaid board member of the Rosetta Commons. Under institutional participation agreements between the University of Washington, acting on behalf of the Rosetta Commons, Johns Hopkins University may be entitled to a portion of revenue received on licensing Rosetta software including methods discussed/developed in this study. As a member of the Scientific Advisory Board, J.J.G. has a financial interest in Cyrus Biotechnology. Cyrus Biotechnology distributes the Rosetta software, which may include methods developed in this study. These arrangements have been reviewed and approved by the Johns Hopkins University in accordance with its conflict-of-interest policies.

Data Availability

The source code to train and run DeepSCAb, as well as pretrained models, are available at https://github.com/Graylab/DeepSCAb. The structures predicted by DeepSCAb and alternative methods for benchmarking have been deposited at Zenodo: 10.5281/zenodo.6371490.

Funding Statement

This work was supported by National Science Foundation Research Experience for Undergraduates grant DBI-1659649 (D.A.), AstraZeneca (J.A.R.), National Institutes of Health grants T32-GM008403 (J.A.R.), R35- GM141881 (J.A.R.), R35-GM141881 (J.A.R.), and R01-GM078221(S.P.M., J.J.G.). Computational resources were provided by the Maryland Advanced Research Computing Cluster (MARCC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Sela-Culang I, Kunik V, Ofran Y. The structural basis of antibody-antigen recognition. Frontiers in Immunology. 2013;4:1–13. doi: 10.3389/fimmu.2013.00302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Tsuchiya Y, Mizuguchi K. The diversity of H3 loops determines the antigen-binding tendencies of antibody CDR loops. Protein Science. 2016;4(25):815–825. doi: 10.1002/pro.2874 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Leem J, Dunbar J, Georges G, Shi J, Deane CM. ABodyBuilder: Automated antibody structure prediction with data-driven accuracy estimation. mAbs. 2016;7(8):1259–1268. doi: 10.1080/19420862.2016.1205773 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Schritt D, Li S, Rozewicki J, Katoh K, Yamashita K, Volkmuth W, et al. Repertoire Builder: High-throughput structural modeling of B and T cell receptors. Molecular Systems Design and Engineering. 2019;4(4):761–768. doi: 10.1039/C9ME00020H [DOI] [Google Scholar]
  • 5. Weitzner BD, Jeliazkov JR, Lyskov S, Marze N, Kuroda D, Frick R, et al. Modeling and docking of antibody structures with Rosetta. Nature Protocols. 2017;2(12):401–416. doi: 10.1038/nprot.2016.180 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Spassov VZ, Yan L, Flook PK. The dominant role of side-chain backbone interactions in structural realization of amino acid code. ChiRotor: A side-chain prediction algorithm based on side-chain backbone interactions. Protein Sci. 2007;16(3):494–506. doi: 10.1110/ps.062447107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Chiu ML, Goulet DR, Teplyakov, Gilliland GL. Antibody Structure and Function: The Basis for Engineering Therapeutics. Antibodies. 2019;8(4):55. doi: 10.3390/antib8040055 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Reichert JM. Antibodies to watch in 2017. mAbs. 2017;9(2):167–181. doi: 10.1080/19420862.2016.1269580 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. Patterns. 2020;1(9):100–142. doi: 10.1016/j.patter.2020.100142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D. Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences of the United States of America. 2020;3(117):1496–1503. doi: 10.1073/pnas.1914677117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–876. doi: 10.1126/science.abj8754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Ruffolo JA, Guerra C, Mahajan SP, Gray JJ. Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics. 2020;36(1):i268–i275. doi: 10.1093/bioinformatics/btaa457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Ruffolo JA, Sulam J, Gray JJ. Antibody structure prediction using interpretable deep learning. Patterns. 2022;3(2). doi: 10.1016/j.patter.2021.100406 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Abanades B, Georges G, Bujotzek A, Deane CM. ABlooper: Fast accurate antibody CDR loop structure prediction with accuracy estimation. Bioinformatics. 2022. doi: 10.1093/bioinformatics/btac016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Cohen T, Halfon M, Schneidman-Duhovny D. NanoNet: Rapid end-to-end nanobody modeling by deep learning at sub angstrom resolution. bioRxiv. 2021. doi: 10.1101/2021.08.03.454917 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Krivov GG, Shapovalov MV, Dunbrack RL. Improved prediction of protein side-chain conformations with SCWRL4. Proteins: Structure, Function and Bioinformatics. 2009;4(77):778–795. doi: 10.1002/prot.22488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Shapovalov MV and Dunbrack RL Jr A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure. 2011;19(6). doi: 10.1016/j.str.2011.03.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Leem J, Georges G, Shi J, Deane CM. Antibody side-chain conformations are position-dependent. Proteins: Structure, Function and Bioinformatics. 2018;4(86):383–392. doi: 10.1002/prot.25453 [DOI] [PubMed] [Google Scholar]
  • 20. Dunbar J, Krawczyk K, Leem J, Baker T, Fuchs A, Georges G. SAbDab: The structural antibody database. Nucleic Acids Research. 2014;D1(42):1140–1146. doi: 10.1093/nar/gkt1043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Adolf-Bryfogle J, Xu Q, North B, Lehmann A, Dunbrack RL. PyIgClassify: a database of antibody CDR structural classifications. Nucleic acids research. 2014;43(D1):D432–D438. doi: 10.1093/nar/gku1106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is All You Need. 31st Conference on Neural Information Processing Systems. 2017.
  • 23. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The protein data bank. Nucleic acids research. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Kovaltsuk A, Leem J, Kelm S, Snowden J, Deane CM, Krawczyk K. Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology. 2018;8(201):2502–2509. doi: 10.4049/jimmunol.1800708 [DOI] [PubMed] [Google Scholar]
  • 25. Jeliazkov J, Frick R, Zhou J, Gray JJ. RosettaAntibody generated models for a dataset of 49 antibody-Fv structures. Zenodo. 2020. doi: 10.5281/zenodo.3724832 [DOI] [Google Scholar]
  • 26. Tien MZ, Meyer AG, Sydykova DK, Spielman SJ, Wilke CO. Maximum allowed solvent accessibilites of residues in proteins. PLoS ONE. 2013;8(11):e80635. doi: 10.1371/journal.pone.0080635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Schwarz D, Georges G, Kelm S, Shi J, Vangone A, Deane CM. Co-evolutionary distance predictions contain flexibility information. Bioinformatics. 2021;38(1):65–72. doi: 10.1093/bioinformatics/btab562 [DOI] [PubMed] [Google Scholar]
  • 28. Anand N, Eguchi R, Mathews II, Perez CP, Derry A, Altman RB, et al. Protein sequence design with a learned potential. Nature Communications. 2022;746(13). doi: 10.1038/s41467-022-28313-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Adolf-Bryfogle J, Kalyuzhniy O, Kubitz M, Weitzner BD, Hu X, Adachi Y, et al. RosettaAntibodyDesign (RAbD): A General Framework for Computational Antibody Design. PLoS Computational Biology. 2018;4(14). doi: 10.1371/journal.pcbi.1006112 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Alexey Porollo

13 Dec 2021

PONE-D-21-28666Improved antibody structure prediction by deep learning of side chain conformationsPLOS ONE

Dear Dr. Gray,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

All reviewers appreciated the presented work and its importance to the field. However, both major and minor issues were identified that need to be addressed in the revised manuscript. Please refer to the reviewers` comments at the end of this letter for details.

As an academic editor, I also have a number of comments and requests to update the manuscript in order to improve its clarity and reproducibility as well as to make it compliant with the journal publication policy.

Scientific issues

1. Methods section does not contain any information about validation/control sets.

2. The input vector is not fully defined. What is L: the length of the entire antibody, variable region of both chains, or CDR part only? L as an input assumes a constant value, but all antibodies, especially in Fv part only, are of variable length. When a shorter sequence is considered, how the empty positions are filled, padded with 0s or something else? If the input is in one-hot format, why the chain delimiter occupies only one position, not 2? What is the input format for the 21st position (light vs heavy chain) used then?

3. 99% sequence identity threshold appears very permissive that may result in the over-optimistic results. The authors need to compute and provide distributions of the antibodies per bins of sequence identity, using e.g. USEARCH/UCLUST tool, followed by the distribution as to where most of those mismatches fall into for each sequence identity bin (e.g., in the CDR or otherwise).

4. While the authors claim importance of the predictions to the subsequent AB-antigen docking, they provided no information as to how antigens and induced fit were accounted for in their model. Furthermore, the authors need to discuss and describe the following aspects of training and validating of predicted structural data: (1) Report the number of AB structures in bound and unbound forms used in training and test sets. (2) Given that many loop regions (which are important in the context of CDR) are frequently too flexible to be resolved by X-ray or result in multiple occupancies in the ATOM section of PDB file, the authors have to describe how they used structures with missing atoms or atoms with multiple occupancies. The same pertains to NMR-based modes – which model from the ensemble was used for training and validation?

5. Methods should contain information how values of relative solvent accessibility were computed. Figure 4 contains ranges with SASA > 100%, which is confusing.

6. If not demonstrated in Results, the issue of cross-reactivity of Abs should be at least pointed out in Discussion. There are many instances, e.g. in autoimmune disorders, when the same Ab naturally evolved against viral proteins cross-reacts with the human (host in general) proteins that have no sequence similarity to the viral antigens. For example, use published reports in lupus. The authors at least need to offer some hypothesis in Discussion as to why this may happen.

Editorial issues

1. Description of the Training set appears to be copied verbatim from the authors` previous publication, which technically falls into self-plagiarism category.

2. All references have to adhere to the scientific citation format. For example, references 14, 15, and 16 do not contain the source of publication, such as journal.

3. All associated software should be made publicly available in order to allow reviewers to assess its functionality.

Please submit your revised manuscript by Jan 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Alexey Porollo, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please amend your Methods section to provide a URL for the Protein Data Bank.

3. Thank you for stating the following financial disclosure:

“This work was supported by National Science Foundation (https://nsf.gov/) Research Experience for Undergraduates grant DBI-1659649 (D.A.), AstraZeneca (https://www.astrazeneca.com/) (J.A.R.), National Institutes of Health (https://www.nih.gov/) grants T32-GM008403 (J.A.R.), R01-GM078221(S.P.M., J.J.G.), and R01-GM127578(S.P.M., J.J.G.). Computational resources were provided by the Maryland Advanced Research Computing Cluster (MARCC) (https://www.marcc.jhu.edu/).”

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

If this statement is not correct you must amend it as needed.

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Competing Interests section:

“Dr. Gray is an unpaid board member of the Rosetta Commons. Under institutional participation agreements between the University of Washington, acting on behalf of the Rosetta Commons, Johns Hopkins University may be entitled to a portion of revenue received on licensing Rosetta software including methods discussed/developed in this study. As a member of the Scientific Advisory Board, J.J.G. has a financial interest in Cyrus Biotechnology. Cyrus Biotechnology distributes the Rosetta software, which may include methods developed in this study. These arrangements have been reviewed and approved by the Johns Hopkins University in accordance with its conflict-of-interest policies.”

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

5. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

6. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This work was supported by National Science Foundation Research Experience for Undergraduates grant DBI-1659649 (D.A.), AstraZeneca (J.A.R.), National Institutes of Health grants T32-GM008403 (J.A.R.) and R01-GM078221(S.P.M., J.J.G.). Computational resources were provided by the Maryland Advanced Research Computing Cluster (MARCC).”

We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This work was supported by National Science Foundation (https://nsf.gov/) Research Experience for Undergraduates grant DBI-1659649 (D.A.), AstraZeneca (https://www.astrazeneca.com/) (J.A.R.), National Institutes of Health (https://www.nih.gov/) grants T32-GM008403 (J.A.R.), R01-GM078221(S.P.M., J.J.G.), and R01-GM127578(S.P.M., J.J.G.). Computational resources were provided by the Maryland Advanced Research Computing Cluster (MARCC) (https://www.marcc.jhu.edu/).”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is a very decent piece of work!

The authors develop a ML pipeline that predicts not only antibody structure from its sequence, but also the side chain conformations. Of note, predicting antibody structure is still a very challenging task for which Alphafold2 has shown disappointing results (probably due to the lack of co-evolutionary information between antibodies and antigens). Therefore, there is critical need for such type of tools.

Particularly interesting points:

- The authors relax predicted atomic/residue distances into a realistic 3D structure using Rosetta, which is therefore more useful and makes comparison to experimental structures easier.

- This study extends their previous work (Ref 13) by additionally prediting the side chain of the antibody, which is indeed a lacking point of most predictions methods. As they write, Alphafold2 does predict side residue conformation (but with low accuracy), so a tool specifically benchmarked for antibodies is needed. Abodybuilder does predict it though.

- The authors use the attention layer as to interprete the results (which anchors are most determinant in the side chains conformations), which is a good example of gained knowledge/interpretability from a trained model, which is appreciated.

- We believe the authors provided convincing sets of controls as to show the preformance of this tool.

Major point:

- We recall that AbodyBuilder also predicts side-chains in two ways: "complete" prediction, where every side chain is predicted (using PEARS, we think), and "partial" prediction, where side chains of identical residues from the template are retained, and the remaining side chains are also predicted. Wouldn’t it make sense to compare the performance to AbodyBuilder (and not only PEARS alone), or did we miss something?

Weak points, that are minor but would improve the manuscript.

- In case there is a revision round, please make the code and data available to the reviewers. We couldn’t assess whether it is easy to use/reproducible.

- The language is pretty technical, and some concepts could be better explained to non-specialists, such as the interest of usnig «decoy discrimination», the conditional prediction of side chains in Figure 1A.

- The discussion starts with the importance of the work in the context of docking, but is not really substantiated, although it is likely true... Could the authors discuss more reasons to believe so? For instance, evil’s advocate could say that due to side chain flexibility, knowing the side chains might or might not help docking that much.

- We didn’t understand whether the rotamer libraries were inputed, and from which data. How do the rotamers in the final predicted structures differ from the used rotamer library? Does it advocate for antibody-specific rotamer libraries?

Reviewer #2: In their paper 'Improved antibody structure prediction by deep learning of side chain conformations' authors present DeepSCAb - novel deep learning

method of predicting structure of antibody's variable fragments from sequence. The authors use complicated, highly tailored for the task model that combines 1D and 2D residual

convolutional blocks with multi-head attention module to predict dihedral angles for amino acid side chains.

In my opinion, the paper clearly explains the development of the method, performs several analyses of its output and compares its performance to other methods.

This work is of great importance for the development of new powerful biotherapeutics. I especially like how the authors predict side chain dihedral values conditionally.

However, I have the following concerns regarding this publication:

Major Comments:

1. My major concerns relate to evaluation of the model's performance and comparisons to what the authors call 'control network' and DeepH3 (previous work by the authors).

First, I don't see how the idea behind the control network makes sense. The authors train it to predict side chain conformations from sequence without any information about

backbone, which in my opinion is meaningless - it just violates the hierarchy of protein structure and clearly calls for much more advanced modeling techniques such as Alphafold,

which first can internally infer backbone conformation and then predict side chains, making this task essentially equal to the full protein structure prediction. Therefore,

I don't see how this network is useful as a baseline.

Second, the performance improvement compared to DeepH3 is very modest, if present at all (the authors give dRMSDs of 0 and -0.1 angstroms). I would like to see uncertainties of all the

RMSD and dRMSD values presented in the Table 1 and in corresponding parts of the text. This will allow to see if the performance improvements are statistically significant.

In the introduction, the authors cite a number of works which present other methods for antibody structure prediction (e.g. ABLooper) and general protein structure prediction

(e.g. Alphafold2 and RoseTTAFold). I think that the authors should use those methods and compare their performance to DeepSCAb's.

2. The authors used PEARS, SCWRL4 and Rosetta for side chain prediction. I think that there is not enough detail given about settings used for Rosetta, which allows, in addition

to the force field, tune the number of rotamers used as initial seed and other parameters, which significantly change performance. For example, there are settings that allow Rosetta

to use more rotamers during packing. Did the authors explore that? Also, there are newer methods of side chain packing that claim to be better than the ones used by the authors.

Minor comments:

1. Would be also interesting to discuss cases when DeepSCAb is worse then other methods. Why do you think this happens?

2. In the method description would be helpful to state explicitly what L is - is it the length of the full protein or just the length of the loop? I presume you model

full protein, but do you think it could be interesting (and computationally less expensive) to model only part of anybody having most of it fixed?

3. Error bars in Figure 4 should not reach negative RMSD values, they should be cut off at 0. I also not sure about usefulness of that figure given that the difference between all the

method is insignificant.

Reviewer #3: The manuscript presents a method that explicitly addresses the antibody side chain modeling. Previous approaches focused on backbone modeling only, this is the first antibody modeling approach that predicts side chains. This is done elegantly by adding a new module to the antibody network that predicts backbone distances and angles for further optimization by Rosetta. The new module predicts the side chain rotamer angles depending on the prediction of backbone distances and angles.

Comments:

1. Test set cutoff of 99% can lead to very similar H3 loops in the training and test set.

2. Generation of 2,800 decoy models - how much time this takes for one antibody sequence? Can the same results be achieved with fewer decoys?

3. Stated that "the addition of side chain orientations is beneficial for accurately predicting pairwise geometries." but there is no significant improvement from DeepH3 in Backbone RMSD (top1 and top 5).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Alexey Porollo

25 May 2022

Simultaneous prediction of antibody backbone and side-chain conformations with deep learning

PONE-D-21-28666R1

Dear Dr. Gray,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Alexey Porollo, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have made the text clearer and answered all my points, including the points of the other reviewers.

Reviewer #2: I believe that the authors have addressed all major concerns raised in the previous round of the review.

Reviewer #3: The authors revised and improved the manuscript based on the editor and reviewers comments

Minor comments

Using 99% sequence identity cut-off: I don’t think that the entropy analysis is the best approach in this case. The more straightforward solution is to show that comparable results can be achieved for other cut-offs, for example 90% and 95%.

Is it possible to add the runtimes for antibody modeling, including training of the network, inference, and RosettaAntibody modeling.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Acceptance letter

Alexey Porollo

6 Jun 2022

PONE-D-21-28666R1

Simultaneous prediction of antibody backbone and side-chain conformations with deep learning

Dear Dr. Gray:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Alexey Porollo

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Supporting information for manuscript.

    File containing containing supporting tables and figures for DeepSCAb method for prediction of antibody side-chain conformations.

    (PDF)

    Attachment

    Submitted filename: Response to PLOS reviewers.pdf

    Data Availability Statement

    The source code to train and run DeepSCAb, as well as pretrained models, are available at https://github.com/Graylab/DeepSCAb. The structures predicted by DeepSCAb and alternative methods for benchmarking have been deposited at Zenodo: 10.5281/zenodo.6371490.

    The structures used to train the models presented in this work were collected from SAbDab [20] (http://opig.stats.ox.ac.uk/webapps/newsabdab/sabdab), which curates antibody structures from the Protein Data Bank [23] (https://www.rcsb.org). The source code to train and run DeepSCAb, as well as pretrained models, are available at https://github.com/Graylab/DeepSCAb. The structures predicted by DeepSCAb and alternative methods for benchmarking have been deposited at Zenodo: 10.5281/zenodo.6371490.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES