Skip to main content
Molecular Therapy. Methods & Clinical Development logoLink to Molecular Therapy. Methods & Clinical Development
. 2019 Oct 1;15:264–274. doi: 10.1016/j.omtm.2019.09.008

Machine Learning Enables Accurate Prediction of Asparagine Deamidation Probability and Rate

Jared A Delmar 1,, Jihong Wang 1, Seo Woo Choi 2, Jason A Martins 2, John P Mikhail 2
PMCID: PMC6923510  PMID: 31890727

Abstract

The spontaneous conversion of asparagine residues to aspartic acid or iso-aspartic acid, via deamidation, is a major pathway of protein degradation and is often seriously disruptive to biological systems. Deamidation has been shown to negatively affect both in vitro stability and in vivo biological function of diverse classes of proteins. During protein therapeutics development, deamidation liabilities that are overlooked necessitate expensive and time-consuming remediation strategies, sometimes leading to termination of the project. In this paper, we apply machine learning to a large (n = 776) liquid chromatography-tandem mass spectrometry (LC-MS/MS) dataset of monoclonal antibody peptides to create computational models for the post-translational modification asparagine deamidation, using the random decision forest method. We show that our categorical model predicts antibody deamidation with nearly 5% increased accuracy and 0.2 MCC over the best currently available models. Surprisingly, our model also paces or outperforms advanced and conventional models on an independent non-antibody dataset. In addition to deamidation probability, we are able to accurately predict deamidation rate (R2 = 0.963 and Q2 = 0.822), a capability with no peer in current models. This method should enable significant improvement in protein candidate selection, especially in biopharmaceutical development, and can be applied with similar accuracy to enzymes, monoclonal antibodies, next-generation formats, vaccine component antigens, and gene therapy vectors such as adeno-associated virus.

Keywords: deamidation, machine learning, prediction, developability, stability, drug development, therapeutic protein, antibody, mab, IgG

Introduction

Therapeutic proteins are an important and growing class of drugs that includes peptides, such as insulin; cytokines, like erythropoietin; monoclonal antibodies (mAbs), which are among the most successful cancer therapies; next-generation formats, such as antibody-drug conjugates, bispecific antibodies, and fusion proteins; as well as vaccine components and gene therapy vectors. While small molecules comprise the largest class of new drug approvals, nearly 30% of US Food and Drug Administration (FDA) approvals in 2018 were protein based, up from 26% in 2017. As of the writing of this paper, half of new drugs approved by the FDA in 2019 represent biologics.1

Therapeutic proteins offer new mechanisms of action, higher target specificity, lower toxicity, and longer-acting pharmacokinetics, compared to small molecule drugs.2, 3, 4 However, the development of therapeutic proteins poses additional challenges. Not only must the drug be effective, but it must also be “developable,” a concept that encompasses many characteristics including high yield and homogeneity from cell culture, high purity drug substance after purification processing, low viscosity and high stability at the high concentrations necessary for drug product, high stability at in vitro long-term storage conditions and in vivo after administration, high target specificity, and, for antibodies, unimpaired neonatal Fc receptor (FcRn) binding.2,5 Nearly all of the factors that make a protein drug developable are derived from the amino acid sequence, including site-specific post-translational modifications (PTMs).6

In particular, the spontaneous non-enzymatic conversion of asparagine to aspartic acid or iso-aspartic acid via deamidation is a major pathway of protein degradation and is often seriously disruptive to biological systems.7, 8, 9 Deamidation has been shown to negatively affect both in vitro stability and in vivo biological function of diverse classes of proteins. Deamidation has been reported as a critical quality attribute in many monoclonal antibodies due to its impact on biological activity.10, 11, 12, 13 In one humanized monoclonal immunoglobulin G1 (IgG1) antibody drug, an asparagine in the heavy-chain complementarity determining region 2 (CDR2) loop was found to deamidate in vivo, which greatly decreased the drug’s efficacy.14 In another case, heavy-chain CDR deamidation resulted in an almost complete loss of potency and binding activity of a therapeutic monoclonal antibody.15 In adeno-associated virus, an emerging new vector for gene therapy, extensive capsid deamidation has been observed that impacts transduction and correlates to a loss of vector activity.16 Deamidation of asparagine residues can also significantly affect immunogenicity and efficacy of protein-based vaccines. Specifically, progress to develop next-generation anthrax vaccines has been halted by vaccine instability resulting from asparagine deamidation in anthrax protective antigen.17, 18, 19, 20, 21 Even in the nontherapeutic enzyme glucoamylase, used commercially to produce sweeteners and ethanol, asparagine deamidation causes a decrease in enzyme activity and change in thermodynamic stability.22, 23, 24

Prediction of deamidation liability as early as possible in protein drug development is important because many more candidate drugs are proposed than can be tested. For example, typical antibody generation results in hundreds of candidates, which far exceeds the capacity of a drug development organization.2,25 Development of a therapeutic protein is so costly in both money and time that, after an initial assessment for screening, only a single candidate is moved forward in most cases.16,26,27 Sequence liabilities that are not dealt with as early as possible necessitate more expensive and time-consuming remediation strategies later in development26 and could lead to termination of the project.

Computational tools already exist to facilitate drug candidate screening by the identification of sequence liabilities.6,28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 In the case of asparagine deamidation,36,40, 41, 42, 43, 44, 45 currently available tools suffer from several limitations: they provide only a binary (yes, high risk to deamidate; or no, low risk to deamidate) prediction,36,40,42,43 require an experimental crystal structure,42, 43, 44, 45 or are applicable only to antibody asparagines.36,40 All offer no36,40,42 or low accuracy41,43, 44, 45 predictions of deamidation rate. Oversimplified models tend to overestimate the number of deamidation sites greatly, which leads to over-engineering and rejecting good drug candidates too early in development. On the other hand, these models may also overlook asparagines for which deamidation is rarely observed (such as NK or NW sites), which can lead to costly and ineffective drug development.

In this paper, we apply machine learning to a large (n = 776) liquid chromatography-tandem mass spectrometry (LC-MS/MS) dataset of monoclonal antibody peptides to create accurate random decision forest models for the PTM asparagine deamidation.46 We show that our categorical model predicts antibody deamidation likelihood with nearly 5% increased accuracy and 0.2 MCC over the best currently available models. Surprisingly, our model also paces or outperforms advanced and conventional models on an independent non-antibody dataset, including enzyme, antigen, and viral capsid deamidation sites. In addition to deamidation probability, we are able to accurately predict deamidation rate (R2 = 0.963 and Q2 = 0.822), a capability with no peer in current models. We provide evidence that our method can be applied with equal accuracy to predict the likelihood and rate of site-specific asparagine deamidation in any protein of interest.

Results and Discussion

Feature Selection

Spontaneous deamidation of asparagine to aspartic acid or iso-aspartic acid proceeds by one of two reaction mechanisms (Figure 1). At neutral to basic pH, the most common route is by a nucleophilic attack of the asparagine side chain by the backbone nitrogen of the following (N+1) residue, forming the cyclic succinimide intermediate. Hydrolysis at one of two carbonyls of the succinimide intermediate results in either aspartic acid or iso-aspartic acid. Below pH 5, direct hydrolysis of the asparagine side chain amide to aspartic acid is the dominant reaction.8,9

Figure 1.

Figure 1

Asparagine Deamidation Reaction

(A and B) Spontaneous degradation of asparagine can occur by (A) direct hydrolysis of the side chain to aspartic acid or (B) via a succinimide intermediate, produced by a nucleophilic attack of the side chain carbonyl by the following (N+1) residue backbone nitrogen, producing either iso-aspartic acid or aspartic acid. Residues are rendered as sticks with Asn, Asp, and iso-Asp, and succinimide carbons colored gray, green, and cyan, respectively (O, red; N, blue).

Both mechanisms have been shown to rely on both the primary and three-dimensional (3D) structure, with the residue immediately following the asparagine residue (N+1) having the largest effect.8,9,47, 48, 49, 50, 51 The amino side residue preceding the asparagine (N−1) was shown to have little to no effect on deamidation rate.50,51 Steric hindrance, conformational space, and electrostatic effects introduced by the N+1 residue may all affect the ability of the side chain and/or backbone to align and form the cyclic intermediate.9 As both reaction mechanisms require hydrolysis to form the final aspartic acid or iso-aspartic acid product, availability of water molecules, or a proton donor, and solvent exposure may directly influence the rate of deamidation.9 Finally, hydrogen bonding to the side chain or backbone may stabilize asparagine and prevent degradation to aspartic acid.

Taken together, these observations compiled from literature informed 12 total parameters for asparagine deamidation likelihood (Table 1), which our machine-learning models would use to predict deamidation. The N+1 residue was taken into account as both a categorical variable and as the experimental half-life of a synthetic pentapeptide (pentapeptide half-life, pphl) containing the same N−1 and N+1 sequence, measured by Robinson et al.51 Half-lives were not reported by Robinson et al. for pentapeptides with asparagine in the N+1 position, likely because it is difficult to distinguish between deamidation in the N and N+1 position in this case. Thus, when N+1 = N, we used an average pphl of 5.7 days.

Table 1.

Predictors for Asparagine Deamidation Machine Learning Model

Structural / Chemical Category Parameter
Primary sequence pentapeptide deamidation half-life (pphl, days)
categorical N+1 residue
Backbone orientation backbone dihedral Phi (φ,°)
backbone dihedral Psi (Ψ,°)
nucleophilic C-N attack distance (Å)
Side-chain orientation side-chain dihedral chi1 (χ1,°)
side-chain torsion chi2 (χ2,°)
Solvent accessibility percent solvent accessibility (PSA, %)
solvent accessible surface area (SASA, Å2)
Hydrogen bonding hydrogen bonds to side chain (#)
Asn local secondary structure (Sheet)
Asn local secondary structure (Loop)
Machine-learning parameter (for regression model only) categorical model probability output (%)

12 total parameters were used to inform the categorical machine-learning model to predict deamidation likelihood, comprising 6 general categories. For the regression model to prediction deamidation rate, the output of the categorical model was included as an additional predictor.

The ability of the side chain and backbone to align to form the cyclic succinimide intermediate was taken into account by the backbone dihedral angles (phi and psi), asparagine side-chain dihedrals (chi1 and chi2), and distance between the side chain carbonyl and backbone nitrogen (nucleophilic attack distance).42 Solvent accessibility was expressed as a percent of the total residue area (percent solvent accessibility [PSA]), as well as area in Å2 (solvent accessible surface area [SASA]). Hydrogen bonding to the asparagine side chain was predicted if a potential donor-acceptor pair was found within 3 Å and counted as the total number of predicted bonds (up to 4). Hydrogen bonding to the backbone was accounted for by secondary structure (either sheet or loop). We do not need a third variable for helical secondary structure, because if the local secondary structure is neither a sheet nor loop, an α helix can be assumed.

Because the predictive tool we developed is most valuable during early development or candidate selection of a deamidation-liable protein, when little to no experimental data is available, we relied only on the primary sequence of the proteins to train our models. Two of the deamidation predictors we chose (N+1 residue and pphl) could be gleaned from the primary sequence directly. For the 9 parameters that could not, the structure of each protein was generated by homology modeling and predictors were extracted from the predicted 3D structure. To predict deamidate rate, we used the same 12 predictors for the regression model, with an additional parameter for the output of the first classification model, representing the predicted likelihood of deamidation.

Training and Validation Dataset Construction

It is expected that at least 80% of the effort involved in training a machine-learning model is dedicated to data processing and cleaning.52 Machine learning requires a large amount of data with a high degree of standardization. Thus, we performed a large side-by-side forced degradation study for 34 in-house IgG molecules, comprising a total of 608 asparagine sites that include 304 unique sites. We accelerated deamidation by incubating each molecule at 40°C and pH 8.0 for up to 4 weeks and measured the deamidation abundance at 0-, 2-, and 4-week time points for each site by LC-MS/MS tryptic peptide mapping. The deamidation half-life (t1/2) of each asparagine was calculated by least-squares fit of the data to an exponential decay function (Figure S1).

This experimental t1/2. was used to train the regression model for half-life prediction. For training a classification machine-learning model to predict the probability of observing deamidation, we set the threshold at 1.0% deamidation per month or t1/2 ≈276 weeks. For example, if we observed only a 0.9% increase in deamidation after 4 weeks under stress conditions, we would train the classification model that this site did not deamidate (class “no”). If we measured a 1.1% increase in deamidation after 4 weeks at stress conditions for an asparagine site, this site was reported as class “yes,” or deamidated.

Each asparagine located in the variable region of the antibodies in our forced degradation study was included in the training set. Because our training set included six antibody heavy-chain formats53, 54, 55, 56 and two light-chain formats (Figure S2), we also included one copy of each unique constant region asparagine in the training data.

Initial models trained on our in-house dataset made accurate predictions overall (data not shown). However, they performed relatively poorly on asparagines located in IgG complementarity determining regions (CDRs)—probably because, out of the 304 unique asparagines in our initial dataset, only 25 unique CDR asparagines were found to deamidate. To give our models more examples of CDR deamidation to learn from, we expanded the training dataset to include 33 additional clinical-stage IgG1s (and 39 additional CDR deamidation sites) with deamidation data published by Lu et al.57 (Table S3). The stress condition used by Lu et al.57 (40°C and pH 8.5) was similar to our own, and their data was incorporated into the training sets for both our categorical and regression models. Only the final models that including training data from Lu et al. are evaluated here.

To validate our models, we constructed an independent validation dataset with data from 12 additional in-house IgG molecules that were not contained in the training dataset, stressed at identical conditions to the training set molecules. The observed deamidation frequency in the validation set was consistent with that of the training set (Figure S3). Predictors and LC-MS/MS data for these molecules were added to an independent validation set. This independent validation dataset was supplemented with non-antibody data from both the validation set published by Jia et al.,42 containing Aspergillus awamori glucoamylase,22, 23, 24 anthrax antigen,17, 18, 19,21 and human angionenin RNase,45 and recent capsid viral protein 3 (VP3) deamidation data published by Giles et al.16 from adeno-associated virus 8 (AAV8), an emerging vector for gene therapy (Table S3).

Machine-Learning Models for Predicting Deamidation Likelihood and Rate

Both the classification model and regression model were random forest models built in RStudio using the randomForest46 and caret58 libraries. The number of trees and number of parameters tried at each split were optimized by hand to minimize the out-of-bag error estimate. Because the output of the classification model is a probability that an asparagine belongs to class “yes,” or will deamidate, the probability threshold at which we interpret the prediction as “yes” or “no” was also optimized after model building to maximize the accuracy.

Statistics for the fit to the training set were calculated for both the classification and regression models. Notably, the classification model was able to achieve 100% accuracy on the training set, using 12 parameters to determine whether each of 776 asparagines would deamidate with no mistakes made. The regression model was able to predict t1/2 for the 137 deamidated asparagines, 88 of which are unique, in the training set with an R2 of 0.963. The regression model used the same 12 predictors as the classification model, as well as the prediction output from the classification model, for a total of 13 parameters (Table 1).

The top two predictors of deamidation liability, measured by the mean decrease in out-of-bag accuracy when that parameter is excluded from the categorical model, were the N+1 categorical variable and the pphl (Figure 2A). This is consistent with the literature and it is well accepted that the N+1 residue has the greatest effect on the deamidation liability of all studied parameters.8,9,47, 48, 49, 50, 51 Even a conventional one-parameter method using only the N+1 residue is competitive with advanced techniques (Tables 11 and 16). The next three most important parameters were related to the backbone alignment (psi and phi dihedral angles and nucleophilic attack distance), followed by solvent accessibility (SASA and PSA), side-chain alignment (chi1 and chi2 dihedral angles), and hydrogen bonding (side-chain hydrogen bonds and secondary structure). Similarly, Jia et al.42 found that tracking hydrogen bonding, secondary structure in particular, did not improve their asparagine deamidation prediction.

Figure 2.

Figure 2

Categorical and Regression Models Predictor Ranking

(A) Importance of each parameter in the categorical model for predicting deamidation probability was measured by the mean decrease in out-of-bag accuracy when that parameter was excluded from the model. (B) Importance of each parameter in the regression model for predicting deamidation half-life was measured by the mean increase in the out-of-bag percent mean squared error (MSE) when that parameter was excluded from the model.

Table 11.

Statistical Comparison of Predictions Made by Our Categorical Model and Other Models on the Independent Non-mAb Validation Subset

Statistic Categorical Model NG/NN/NS Lorenzo et al.41 Robinson et al.43 Jia et al.42
Accuracy 93.8% 88.8% 93.8% 77.5% 95.0%
MCC 0.686 0.619 0.686 0.459 0.687
Precision 60.0% 43.8% 60.0% 28.0% 71.4%
Recall 85.7% 100.0% 85.7% 100.0% 71.4%
Specificity 94.5% 87.7% 94.5% 75.3% 97.3%
Negative predictive value 98.6% 100.0% 98.6% 100.0% 97.3%
Miss rate 14.3% 0.0% 14.3% 0.0% 28.6%
Fallout 40.0% 56.3% 40.0% 72.0% 28.6%
False discovery rate 5.5% 12.3% 5.5% 24.7% 2.7%
False omission rate 1.4% 0.0% 1.4% 0.0% 2.7%

Statistics were calculated for predictions made by all models on the non-mAb validation subset, which was nearly identical to the validation set used by Jia et al.42

Table 16.

Statistical Comparison of Predictions Made by Our Categorical Model and Other Models on the Independent mAb-Only Validation Subset

Statistic Categorical Model NG/NN/NS Yan et al.40 Lorenzo et al.41
Accuracy 95.6% 86.8% 83.8% 91.2%
MCC 0.796 0.651 0.388 0.616
Precision 100.0% 50.0% 41.7% 66.7%
Recall 66.7% 100.0% 55.6% 66.7%
Specificity 100.0% 84.7% 88.1% 94.9%
Negative predictive value 95.2% 100.0% 92.9% 94.9%
Miss rate 33.3% 0.0% 44.4% 33.3%
Fallout 0.0% 50.0% 58.3% 33.3%
False discovery rate 0.0% 15.3% 11.9% 5.1%
False omission rate 4.8% 0.0% 7.1% 5.1%

To predict t1/2, the side-chain orientation was among the most important variables, measured by the increase in the percentage of mean squared error when that parameter is excluded from the regression model (chi2 and chi1 were 1st and 5th most important, respectively). Chi2 is the closest angle to the carbonyl involved in succinimide formation and Jia et al.42 also observed that chi2 was more important than chi1 in their model.42 Similar to the categorical model, the N+1 variable was among the best predictors, followed by the backbone dihedral angle phi (Figure 2B). Interestingly, pphl, the second-most important predictor for the categorical model, was only the 9th most important predictor for the regression model. It is possible that deamidation rate is strongly influenced by structural effects absent from the Robinson et al.47 pentapeptides. Indeed, we have observed in both our training and validation sets that t1/2 defies the hierarchy set forth by pphl. As in the categorical model, parameters that captured hydrogen bonding to the side chain and secondary structure were the least useful in predicting deamidation rate.

To test whether the models were capable of accurately predicting deamidation likelihood and rate, both classification and regression models were applied to the validation set comprised of 12 unseen antibodies held from the training set and 4 non-mAb proteins (Tables 2, 3, 4, and 5; Figure 3). Surprisingly, the categorical model performed with high accuracy for both mAb and non-mAb proteins, predicting the validation set with more than 93% accuracy and a Matthews correlation coefficient (MCC) of 0.691 even though it had never seen a non-mAb deamidation site before (Table 5). Because deamidation data for the non-mAbs came from literature and were either not quantitative or not collected under the same conditions as our in-house data, we did not validate the regression model on these molecules. On the validation set molecules with in-house LC-MS/MS data, the regression model was successful, predicting t1/2 with Q2 = 0.822 (Figure 3B).

Table 2.

Confusion Matrix for Predictions Made by the Categorical Machine Learning Model for Predicting Deamidation Liability on the Training Dataset

Prediction →
Positive Negative
Experiment ↓
Positive 137 0
Negative 0 639

Table 3.

Statistics for Predictions Made by the Categorical Machine Learning Model for Predicting Deamidation Liability on the Training Dataset

Statistic Categorical Model
Accuracy 100.0%
MCC 1.000
Precision 100.0%
Recall 100.0%
Specificity 0.0%
Negative predictive value 0.0%
Miss rate 0.0%
Fallout 0.0%
False discovery rate 100.0%
False omission rate 100.0%

Table 4.

Confusion Matrix for Predictions Made by the Categorical Machine Learning Model for Predicting Deamidation Liability on the Independent Validation Dataset

Prediction →
Positive Negative
Experiment ↓
Positive 17 9
Negative 4 165

Table 5.

Statistics for Predictions Made by the Categorical Machine Learning Model for Predicting Deamidation Liability on the Independent Validation Set

Statistic Categorical Model
Accuracy 93.3%
MCC 0.691
Precision 81.0%
Recall 65.4%
Specificity 97.6%
Negative predictive value 94.8%
Miss rate 34.6%
Fallout 19.0%
False discovery rate 2.4%
False omission rate 5.2%

Notably, on the independent validation set containing non-antibody proteins our model was able to achieve 93.3% accuracy and a Matthews correlation coefficient (MCC) of 0.691.

Figure 3.

Figure 3

Regression Machine Learning Model for Predicting Deamidation Rate

(A) Predicted half-life (t1/2, weeks) was plotted versus the experimental measured t1/2 for the training dataset. Individual asparagines are plotted as black circles for and the solid black line indicates where predicted t1/2 = experimental t1/2. Our regression model predicted the training set with R2 = 0.963. (B) Predicted half-life (t1/2, weeks) by our regression model (blue circles), the Robinson et al. model (green squares), and the Lorenzo et al. model (orange triangles) were plotted versus experimentally measured t1/2 (weeks) for the validation set. The solid black line indicates where predicted t1/2 = experimental t1/2. While our regression model predicted the independent validation set with Q2 = 0.822, both the predictions by Robinson et al. and Lorenzo et al. resulted in Q2 < 0.

Out of the 195 total asparagines in our validation set, the categorical model made 13 mistakes. It tended toward the conservative side, underpredicting deamidation in 9 cases and only overpredicting 4 asparagines that were not experimentally observed to deamidate. Interestingly, 5 of the 9 underpredicted sites (almost 40% of our total error) came from one molecule: the AAV8 capsid protein VP3.16 Further, the 5 sites that our model mispredicted were significantly less deamidated than the other 5 deamidation sites observed in VP3, all of which our model correctly predicted (Tables 17 and 20). It was shown by Giles et al.16 that the 5 low abundance deamidation sites did not respond significantly to incubation at 70°C or pH 10 or changes to the purification process. Thus, it is possible that deamidation would remain unchanged in our milder forced degradation conditions (40°C and pH 8), and we should instead consider these sites as non-liable. Nevertheless, our model outperformed both conventional (Table 18) and advanced (Table 19) predictions of deamidation for AAV8 capsid protein VP3 asparagines.

Table 17.

Confusion Matrix for Predictions Made by our Categorical Model on the AAV8 Capsid Protein VP3

Prediction →
Positive Negative
Experiment ↓
Positive 5 5
Negative 0 37

Table 20.

Comparison of Predictions Made by our Categorical Model and the NG/NN/NS Model on Selected Residues of the AAV8 Capsid Protein

Residue N+1 Avg % Deamidation Giles et al.16 Categorical Model NG/NN/NS Lorenzo et al.41
N254 N 9% no yes yes
N255 H ND no no yes
N263 G 99% yes yes yes
N304 N ND no yes yes
N305 N 8% no yes yes
N337 N ND No yes yes
N384 N ND no yes yes
N385 G 88% yes yes yes
N410 N 3% no yes yes
N459 T 7% no no no
N498 N ND no yes yes
N499 N 17% yes yes yes
N500 S ND no yes yes
N514 G 84% yes yes yes
N517 S 4% no yes yes
N540 G 79% yes yes yes
N599 S ND no yes yes
N611 R ND no no no
N670 S ND no yes yes
N693 S ND no yes yes

Individual residue level predictions are shown for each model on a subset of residues from the AAV8 protein. The 5 mispredicted sites by our model were significantly less deamidated than the other 5 deamidation sites observed in the AAV8 capsid, measured by Giles et al.16

Table 18.

Confusion Matrix for Predictions Made by the Conventional NG/NN/NS Model on the AAV8 Capsid Protein VP3

Prediction →
Positive Negative
Experiment ↓
Positive 9 1
Negative 8 29

Table 19.

Confusion Matrix for Predictions Made by the Lorenzo et al. Model on the AAV8 Capsid Protein VP3

Prediction →
Positive Negative
Experiment ↓
Positive 9 1
Negative 9 28

Taken together, the high accuracy at which our models were able to predict deamidation in the diverse proteins in our validation set indicates that these models may be generally applied to predict deamidation in any protein of interest.

Comparison with Advanced and Conventional Models

To evaluate the relative performance of our models, we have applied to our validation set as many currently available predictions of deamidation from the literature as possible. These advanced tools include another machine learning model by Jia et al.,42 a tree-based approach by Yan et al.,40 and empirical calculations by Robinson et al.43 and Lorenzo et al.41. In addition, we compared all of these approaches to a conventional one-parameter method based on the primary sequence alone. For the conventional method (named NG/NN/NS here), if an asparagine is followed by glycine, asparagine, or serine (N+1 = G, N, or S), then it is considered as liable to deamidate. All deamidation sites in our validation set were NG, NN, or NS motifs, so the data was particularly amenable to and not biased against this conventional method. Statistical comparison of the conventional model and our categorical model are shown in Table S1. Conversely, our training set contained many non-NG, -NN, and -NS sites (Figure S3) and was poorly predicted by the conventional method (Table S2).

Because the methods by Robinson et al.43 and Jia et al.42 require crystal structures, these models could be applied to only a subset of our validation set, which is nearly identical to the validation set used by Jia et al. for their method and comprised of only non-mAbs.42 We have removed N395 of A. awamori glucoamylase (PDB: 3GLY) from this validation subset as Chen et al.22 showed that this asparagine is N-glycosylated. Of note, all sites with N+1 = N and N+1 = Q are also missing from the non-mAb validation subset and the model by Jia et al.42 does not provide predictions for them. Finally, two deamidation sites in anthrax protective antigen (N466 and N537) were corrected to match the observations of Verma et al.19 The individual confusion matrices are shown in Tables 6, 7, 8, 9, and 10, and a statistical comparison of our method performance on this validation subset to the conventional method and those of Lorenzo et al.,41 Robinson et al.,43 and Jia et al.42 is shown in Table 11. Jia et al.42 had the highest accuracy on this non-mAb validation set, with one less mistake than our model or that of Lorenzo et al.41 However, their model also performed last in several categories. Our model and that of Lorenzo et al.41 had the second-best overall accuracy of 93.8%, were not the worst performers in any category, and had nearly identical MCC to that of the Jia et al.42 model (Table 11).

Table 6.

Confusion Matrix for Predictions Made by our Categorical Model on the Non-mAb Independent Validation Subset

Prediction →
Positive Negative
Experiment ↓
Positive 6 1
Negative 4 69

Table 7.

Confusion Matrix for Predictions Made by the Conventional NG/NN/NS Model on the Non-mAb Independent Validation Subset

Prediction →
Positive Negative
Experiment ↓
Positive 7 0
Negative 9 64

Table 8.

Confusion Matrix for Predictions Made by the Lorenzo et al. Model on the Non-mAb Independent Validation Subset

Prediction →
Positive Negative
Experiment ↓
Positive 6 1
Negative 4 69

Table 9.

Confusion Matrix for Predictions Made by the Robinson et al. Model on the Non-mAb Independent Validation Subset

Prediction →
Positive Negative
Experiment ↓
Positive 7 0
Negative 18 55

Table 10.

Confusion Matrix for Predictions Made by the Jia et al. Model on the Non-mAb Independent Validation Subset

Prediction →
Positive Negative
Experiment ↓
Positive 5 2
Negative 2 71

The tree-based model proposed by Yan et al.40 is only applicable to IgG mAbs. Thus, in order to compare our methods, we created another subset of our validation set including only IgG mAbs. On this mAb-only independent validation subset, predictions made by our model (Table 12) were compared against those made by the models of Yan et al.40 (Table 13), Lorenzo et al.41 (Table 14), and the conventional one-parameter NG/NN/NS model (Table 15). Again, our categorical model was not the worst performer in any statistic and had the best MCC (0.796) and accuracy (95.6%) at predicting the mAb-only dataset (Table 16).

Table 12.

Confusion Matrix for Predictions Made by Our Categorical Model on the mAb-Only Independent Validation Set

Prediction →
Positive Negative
Experiment ↓
Positive 6 3
Negative 0 59

Table 13.

Confusion Matrix for Predictions Made by the Conventional NG/NN/NS Model on the mAb-Only Independent Validation Set

Prediction →
Positive Negative
Experiment ↓
Positive 9 0
Negative 9 50

Table 14.

Confusion Matrix for Predictions Made by the Yan et al. Model on the mAb-Only Independent Validation Set

Prediction →
Positive Negative
Experiment ↓
Positive 5 4
Negative 7 52

Table 15.

Confusion Matrix for Predictions Made by the Lorenzo et al. Model on the mAb-Only Independent Validation Set

Prediction →
Positive Negative
Experiment ↓
Positive 6 3
Negative 3 56

Unfortunately, we were not able to make a significant comparison to the tree-based method proposed by Sydow et al.36 Their method is restricted to only the CDR of antibodies.

As of the writing of this paper, we were only able to find two methods in the literature for the prediction of deamidation half-life: by Robinson et al.43 and Lorenzo et al.41 Deamidation half-life is both temperature and pH dependent and each model is specific to one condition. Specifically, the Robinson et al.43 model predicts t1/2 for proteins at 37°C and pH 7.4 and Lorenzo et al.41 at slightly basic pH and up to 40°C. In our experience, these conditions are similar enough to our own (40°C and pH 8.0) to make a direct comparison.

Out of the 26 unique deamidation sites in our validation set, only 7 had available t1/2 measurements, all of which were calculated from LC-MS/MS data collected in-house. Thus, we applied each model to these 7 sites for comparison of predictive accuracy. Both the Robinson et al.43 and Lorenzo et al.41 models predicted values of t1/2 that disagreed enough with the experimental values to produce a Q2 < 0. Our model achieved a Q2 of 0.822 (Figure 3B). Both the Robinson et al.43 and Lorenzo et al.41 methods rely heavily on the pphl as the basis for their half-life prediction, while our model ranked pphl as one of the least useful parameters for prediction t1/2 (Figure 2B), which might help to explain the discrepancy in results. While the Lorenzo et al. model tended to underpredict deamidation rate in this independent validation set, both the Lorenzo et al. and NG/NN/NS models overpredicted the number of liable sites in the AAV8 capsid protein VP3 (Tables 17, 18, 19, 20).

Conclusions

We have constructed both a categorical model for predicting whether or not an asparagine is liable for deamidation, and a regression model for determining the rate at which a predicted site deamidates. Both outperform or pace currently available models based on predictions made on independent validation sets.

Although both models were trained on only mAb deamidation data, we found that they applied with similar accuracy to non-mAb molecules in our validation set, including enzyme, antigen, and viral capsid deamidation sites. In contrast to other methods, ours do not require crystallographic 3D coordinates and are not protein class specific. Rather, the structural information used by our models to predict deamidation is drawn from homology models. Thus, they are applicable to any protein for which a similar protein’s structure is available in the PDB.

It is our hope that with more data and increasingly accurate and interpretable models, a fundamental understanding of protein degradation, including deamidation, will be attained, leading to more and better protein-based therapies.

Materials and Methods

3D Model Building and Parameter Extraction

For AstraZeneca in-house molecules, full-length homology models were built using Schrödinger BioLuminate.59 Briefly, the most similar crystal structure from the PDB, by sequence, was first identified by basic local alignment search tool (BLAST).60 This structure and an in-house constant region template were used as scaffolds for the full-length structure. The Protein Preparation Wizard tool was used to add hydrogens, assign bond orders, remove solvent molecules, optimize H-bond assignments, and perform restrained energy minimization. Molecules from the study by Lu et al.57 were modeled similarly; however, only the Fv structure was generated. Predictors of asparagine deamidation were extracted from the 3D homology models within Schrödinger via python script.

Generation of Deamidated IgGs

For IgG deamidation data generated in-house, samples at 10 mg/mL in 50 mM Tris pH 8.0 were incubated at 40°C for 2-week and 4-week time points. Reactants were stored at −80°C prior to analysis by LC-MS/MS.

LC-MS/MS Tryptic Peptide Mapping

20 μL samples at 5 μg/μL were denatured by adding 200 μL of 8 M guanidine, 130 mM Tris, 1 mM EDTA, pH 7.6 denaturing buffer. The samples were then reduced by the addition of 2 μL of 500 mM dithiothreitol. After incubation at 37°C for 30 min, samples were alkylated by the addition of 5 μL of 500 mM iodoacetamide and incubated at ambient temperature for 30 min in the dark. The reduced and alkylated samples were buffer exchanged into a solution containing 2 M urea and 100 mM Tris at pH 8.0 using an Amicon spin filter (EMD Millipore, Billerica, MA, USA; molecular weight cut-off of 10 kDa); 5 μg of trypsin was then added to the spin filter and incubated at 37°C for 4 h. The digested samples were collected from the spin filters, and the digestion was quenched with trifluoroacetic acid.

Peptides produced by enzymatic digestion were eluted on an Acquity Ultra Performance liquid chromatography system (Waters, Milford, MA, USA) equipped with an ethylene bridged hybrid C18 reversed-phase column (1.7 μm, 2.1 mm, 150 mm) using a gradient of 0%–60% acetonitrile at a flow rate of 0.2 mL/min (total elution time of 76 min). Peptides separated on the column were identified by a UV detector and analyzed using an Orbitrap Velos Pro mass spectrometer (Thermo Fisher Scientific). Peak identification was based on both the exact monoisotopic mass and the tandem mass spectrum of the target ion. Deamidation quantitation was based on peak areas from the extracted ion chromatography of corresponding ions.

In most cases in our collected deamidation data for the training and validation sets, sequencing information by MS/MS could distinguish between deamidation on neighboring asparagines in the same tryptic peptide. However, for two NN sites in the validation set, MS/MS data could not distinguish between the N and N+1 residues. Thus, in these cases, the t1/2 was a combined measurement for both sites in the peptide. Half-life predictions made by the regression model for these two sites were also combined prior to analysis.

Random Forest Machine Learning Model Construction

Both the classification model and regression model were random forest models built in RStudio using the randomForest46 and caret58 libraries. The number of trees and number of parameters tried at each split were optimized by manually tuned to minimize the out-of-bag error estimate.

For the classification model, 500 trees were generated with 3 variables tried at each split, producing an out-of-bag error estimate of 4.25% on the training set. The probability threshold at which we interpret the prediction as “yes” or “no” was also optimized to 53% after model building. Confusion matrices and variable importance plots were generated using caret and random Forest libraries, respectively.

The regression model was trained only on the subset of training data containing deamidation sites quantified by LC-MS/MS, including our in-house data and that of Lu et al.57 In this case, 500 trees were generated with 4 variables tried at each split. The out-of-bag predictions explained 63.5% of the variance of the training set. R2 and Q2 were calculated and variable importance plots were generated using caret and randomForest libraries, respectively.

Author Contributions

J.A.D. and J.W. conceptualized and designed the experiments and curated and interpreted the data; J.A.D., S.W.C., J.A.M., and J.P.M. performed the experiments and analyzed the data; J.A.D. wrote the manuscript.

Conflicts of Interest

This work was supported by the global biologics R&D arm of AstraZeneca. Authors J.A.D. and J.W. are employees of AstraZeneca and have stock and/or stock interests or options in AstraZeneca.

Acknowledgments

The authors would like to thank Sydney Clark, Arun Parupudi, Chen Qian, and Weichen Xu for their help with sample preparation; Ben Niu, Chen Qian, and Anthony Shannon for help with data collection; David Arancibia, Trina Do, Lauren Johnson, Nicholas Knoepfle, and Mohamed Ndiaye for help compiling samples for the training set; Hege Beard, Christopher Negron, and Kevin Halligan for their help with homology model building and parameter extraction; Xiaoyu Chen, Bojana Popovic, and Bruck Taddese for helpful discussion and manuscript review; Fiona Cusdin and Christopher Lloyd for help with sample generation for the validation set; and Gail Wasserman, Xiangyang Wang, Xiaoyu Chen, Milton Axley, Thomas Blacklock, Ray Field, T. Alan Hatton, and the 2018 MIT Practice School Cohort for their continual support of the project.

Footnotes

Supplemental Information can be found online at https://doi.org/10.1016/j.omtm.2019.09.008.

Supplemental Information

Document S1. Figures S1–S3 and Tables S1–S3
mmc1.pdf (3.9MB, pdf)
Data S1. Training and Validation Datasets
mmc2.xls (256.5KB, xls)
Document S3. Article plus Supplemental Information
mmc3.pdf (4.6MB, pdf)

References

  • 1.US Food and Drug Administration . 2019. Drugs@fda: FDA approved drug products.https://www.accessdata.fda.gov/scripts/cder/daf [Google Scholar]
  • 2.Jarasch A., Koll H., Regula J.T., Bader M., Papadimitriou A., Kettenberger H. Developability assessment during the selection of novel therapeutic antibodies. J. Pharm. Sci. 2015;104:1885–1898. doi: 10.1002/jps.24430. [DOI] [PubMed] [Google Scholar]
  • 3.Elvin J.G., Couston R.G., van der Walle C.F. Therapeutic antibodies: market considerations, disease targets and bioprocessing. Int. J. Pharm. 2013;440:83–98. doi: 10.1016/j.ijpharm.2011.12.039. [DOI] [PubMed] [Google Scholar]
  • 4.Carter P.J. Potent antibody therapeutics by design. Nat. Rev. Immunol. 2006;6:343–357. doi: 10.1038/nri1837. [DOI] [PubMed] [Google Scholar]
  • 5.Kohli N., Jain N., Geddie M.L., Razlog M., Xu L., Lugovskoy A.A. A novel screening method to assess developability of antibody-like molecules. MAbs. 2015;7:752–758. doi: 10.1080/19420862.2015.1048410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Raybould M.I.J., Marks C., Krawczyk K., Taddese B., Nowak J., Lewis A.P., Bujotzek A., Shi J., Deane C.M. Five computational developability guidelines for therapeutic antibody profiling. Proc. Natl. Acad. Sci. USA. 2019;116:4025–4030. doi: 10.1073/pnas.1810576116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Robinson N.E., Robinson A.B. Molecular clocks. Proc. Natl. Acad. Sci. USA. 2001;98:944–949. doi: 10.1073/pnas.98.3.944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pace A.L., Wong R.L., Zhang Y.T., Kao Y.H., Wang Y.J. Asparagine deamidation dependence on buffer type, pH, and temperature. J. Pharm. Sci. 2013;102:1712–1723. doi: 10.1002/jps.23529. [DOI] [PubMed] [Google Scholar]
  • 9.Catak S., Monard G., Aviyente V., Ruiz-López M.F. Deamidation of asparagine residues: direct hydrolysis versus succinimide-mediated deamidation mechanisms. J. Phys. Chem. A. 2009;113:1111–1120. doi: 10.1021/jp808597v. [DOI] [PubMed] [Google Scholar]
  • 10.Haberger M., Bomans K., Diepold K., Hook M., Gassner J., Schlothauer T., Zwick A., Spick C., Kepert J.F., Hienz B. Assessment of chemical modifications of sites in the CDRs of recombinant antibodies: Susceptibility vs. functionality of critical quality attributes. MAbs. 2014;6:327–339. doi: 10.4161/mabs.27876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Harris R.J., Kabakoff B., Macchi F.D., Shen F.J., Kwong M., Andya J.D., Shire S.J., Bjork N., Totpal K., Chen A.B. Identification of multiple sources of charge heterogeneity in a recombinant antibody. J. Chromatogr. B Biomed. Sci. Appl. 2001;752:233–245. doi: 10.1016/s0378-4347(00)00548-x. [DOI] [PubMed] [Google Scholar]
  • 12.Diepold K., Bomans K., Wiedmann M., Zimmermann B., Petzold A., Schlothauer T., Mueller R., Moritz B., Stracke J.O., Mølhøj M. Simultaneous assessment of Asp isomerization and Asn deamidation in recombinant antibodies by LC-MS following incubation at elevated temperatures. PLoS ONE. 2012;7:e30295. doi: 10.1371/journal.pone.0030295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vlasak J., Bussat M.C., Wang S., Wagner-Rousset E., Schaefer M., Klinguer-Hamour C., Kirchmeier M., Corvaïa N., Ionescu R., Beck A. Identification and characterization of asparagine deamidation in the light chain CDR1 of a humanized IgG1 antibody. Anal. Biochem. 2009;392:145–154. doi: 10.1016/j.ab.2009.05.043. [DOI] [PubMed] [Google Scholar]
  • 14.Huang L., Lu J., Wroblewski V.J., Beals J.M., Riggin R.M. In vivo deamidation characterization of monoclonal antibody by LC/MS/MS. Anal. Chem. 2005;77:1432–1439. doi: 10.1021/ac0494174. [DOI] [PubMed] [Google Scholar]
  • 15.Yan B., Steen S., Hambly D., Valliere-Douglass J., Vanden Bos T., Smallwood S., Yates Z., Arroll T., Han Y., Gadgil H. Succinimide formation at Asn 55 in the complementarity determining region of a recombinant monoclonal antibody IgG1 heavy chain. J. Pharm. Sci. 2009;98:3509–3521. doi: 10.1002/jps.21655. [DOI] [PubMed] [Google Scholar]
  • 16.Giles A.R., Sims J.J., Turner K.B., Govindasamy L., Alvira M.R., Lock M., Wilson J.M. Deamidation of amino acids on the surface of adeno-associated virus capsids leads to charge heterogeneity and altered vector function. Mol. Ther. 2018;26:2848–2862. doi: 10.1016/j.ymthe.2018.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Verma A., Ngundi M.M., Burns D.L. Mechanistic analysis of the effect of deamidation on the immunogenicity of anthrax protective antigen. Clin. Vaccine Immunol. 2016;23:396–402. doi: 10.1128/CVI.00701-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Verma A., Burns D.L. Improving the stability of recombinant anthrax protective antigen vaccine. Vaccine. 2018;36:6379–6382. doi: 10.1016/j.vaccine.2018.09.012. [DOI] [PubMed] [Google Scholar]
  • 19.Verma A., McNichol B., Domínguez-Castillo R.I., Amador-Molina J.C., Arciniega J.L., Reiter K., Meade B.D., Ngundi M.M., Stibitz S., Burns D.L. Use of site-directed mutagenesis to model the effects of spontaneous deamidation on the immunogenicity of Bacillus anthracis protective antigen. Infect. Immun. 2013;81:278–284. doi: 10.1128/IAI.00863-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Baillie L.W. Is new always better than old?: The development of human vaccines for anthrax. Hum. Vaccin. 2009;5:806–816. doi: 10.4161/hv.9777. [DOI] [PubMed] [Google Scholar]
  • 21.D’Souza A.J., Mar K.D., Huang J., Majumdar S., Ford B.M., Dyas B., Ulrich R.G., Sullivan V.J. Rapid deamidation of recombinant protective antigen when adsorbed on aluminum hydroxide gel correlates with reduced potency of vaccine. J. Pharm. Sci. 2013;102:454–461. doi: 10.1002/jps.23422. [DOI] [PubMed] [Google Scholar]
  • 22.Chen H.M., Ford C., Reilly P.J. Substitution of asparagine residues in Aspergillus awamori glucoamylase by site-directed mutagenesis to eliminate N-glycosylation and inactivation by deamidation. Biochem. J. 1994;301:275–281. doi: 10.1042/bj3010275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Sierks M.R., Ford C., Reilly P.J., Svensson B. Functional roles and subsite locations of Leu177, Trp178 and Asn182 of Aspergillus awamori glucoamylase determined by site-directed mutagenesis. Protein Eng. 1993;6:75–79. doi: 10.1093/protein/6.1.75. [DOI] [PubMed] [Google Scholar]
  • 24.Bakir U., Coutinho P.M., Sullivan P.A., Ford C., Reilly P.J. Cassette mutagenesis of Aspergillus awamori glucoamylase near its general acid residue to probe its catalytic and pH properties. Protein Eng. 1993;6:939–946. doi: 10.1093/protein/6.8.939. [DOI] [PubMed] [Google Scholar]
  • 25.Lavoisier A., Schlaeppi J.M. Early developability screen of therapeutic antibody candidates using Taylor dispersion analysis and UV area imaging detection. MAbs. 2015;7:77–83. doi: 10.4161/19420862.2014.985544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yang X., Xu W., Dukleska S., Benchaar S., Mengisen S., Antochshuk V., Cheung J., Mann L., Babadjanova Z., Rowand J. Developability studies before initiation of process development: improving manufacturability of monoclonal antibodies. MAbs. 2013;5:787–794. doi: 10.4161/mabs.25269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Xu Y., Wang D., Mason B., Rossomando T., Li N., Liu D., Cheung J.K., Xu W., Raghava S., Katiyar A. Structure, heterogeneity and developability assessment of therapeutic antibodies. MAbs. 2019;11:239–264. doi: 10.1080/19420862.2018.1553476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Yu J., Shi S., Zhang F., Chen G., Cao M. Predgly: Predicting lysine glycation sites for homo sapiens based on xgboost feature optimization. Bioinformatics. 2019;35:2749–2756. doi: 10.1093/bioinformatics/bty1043. [DOI] [PubMed] [Google Scholar]
  • 29.Islam M.M., Saha S., Rahman M.M., Shatabda S., Farid D.M., Dehzangi A. iProtGly-SS: Identifying protein glycation sites using sequence and structure based features. Proteins. 2018;86:777–789. doi: 10.1002/prot.25511. [DOI] [PubMed] [Google Scholar]
  • 30.Ju Z., Sun J., Li Y., Wang L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput. Biol. Chem. 2017;71:98–103. doi: 10.1016/j.compbiolchem.2017.10.004. [DOI] [PubMed] [Google Scholar]
  • 31.Xu Y., Li L., Ding J., Wu L.Y., Mai G., Zhou F. Gly-PseAAC: Identifying protein lysine glycation through sequences. Gene. 2017;602:1–7. doi: 10.1016/j.gene.2016.11.021. [DOI] [PubMed] [Google Scholar]
  • 32.Reddy H.M., Sharma A., Dehzangi A., Shigemizu D., Chandra A.A., Tsunoda T. GlyStruct: glycation prediction using structural properties of amino acid residues. BMC Bioinformatics. 2019;19(Suppl 13):547. doi: 10.1186/s12859-018-2547-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Akmal M.A., Rasool N., Khan Y.D. Prediction of N-linked glycosylation sites using position relative features and statistical moments. PLoS ONE. 2017;12:e0181966. doi: 10.1371/journal.pone.0181966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Li F., Li C., Revote J., Zhang Y., Webb G.I., Li J., Song J., Lithgow T. GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features. Sci. Rep. 2016;6:34595. doi: 10.1038/srep34595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chuang G.Y., Boyington J.C., Joyce M.G., Zhu J., Nabel G.J., Kwong P.D., Georgiev I. Computational prediction of N-linked glycosylation incorporating structural properties and patterns. Bioinformatics. 2012;28:2249–2255. doi: 10.1093/bioinformatics/bts426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Sydow J.F., Lipsmeier F., Larraillet V., Hilger M., Mautz B., Mølhøj M., Kuentzer J., Klostermann S., Schoch J., Voelger H.R. Structure-based prediction of asparagine and aspartate degradation sites in antibody variable regions. PLoS ONE. 2014;9:e100736. doi: 10.1371/journal.pone.0100736. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Aledo J.C., Cantón F.R., Veredas F.J. A machine learning approach for predicting methionine oxidation sites. BMC Bioinformatics. 2017;18:430. doi: 10.1186/s12859-017-1848-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sankar K., Hoi K.H., Yin Y., Ramachandran P., Andersen N., Hilderbrand A., McDonald P., Spiess C., Zhang Q. Prediction of methionine oxidation risk in monoclonal antibodies using a machine learning method. MAbs. 2018;10:1281–1290. doi: 10.1080/19420862.2018.1518887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Agrawal N.J., Dykstra A., Yang J., Yue H., Nguyen X., Kolvenbach C., Angell N. Prediction of the hydrogen peroxide-induced methionine oxidation propensity in monoclonal antibodies. J. Pharm. Sci. 2018;107:1282–1289. doi: 10.1016/j.xphs.2018.01.002. [DOI] [PubMed] [Google Scholar]
  • 40.Yan Q., Huang M., Lewis M.J., Hu P. Structure based prediction of asparagine deamidation propensity in monoclonal antibodies. MAbs. 2018;10:901–912. doi: 10.1080/19420862.2018.1478646. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Lorenzo J.R., Alonso L.G., Sánchez I.E. Prediction of spontaneous protein deamidation from sequence-derived secondary structure and intrinsic disorder. PLoS ONE. 2015;10:e0145186. doi: 10.1371/journal.pone.0145186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Jia L., Sun Y. Protein asparagine deamidation prediction based on structures with machine learning methods. PLoS ONE. 2017;12:e0181347. doi: 10.1371/journal.pone.0181347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Robinson N.E., Robinson A.B. Prediction of protein deamidation rates from primary and three-dimensional structure. Proc. Natl. Acad. Sci. USA. 2001;98:4367–4372. doi: 10.1073/pnas.071066498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Robinson N.E. Protein deamidation. Proc. Natl. Acad. Sci. USA. 2002;99:5283–5288. doi: 10.1073/pnas.082102799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Robinson N.E., Robinson A.B. Deamidation of human proteins. Proc. Natl. Acad. Sci. USA. 2001;98:12409–12413. doi: 10.1073/pnas.221463198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Breiman L. Random forests. Mach. Learn. 2001;45:5–32. [Google Scholar]
  • 47.Robinson N.E., Robinson A.B., Merrifield R.B. Mass spectrometric evaluation of synthetic peptides as primary structure models for peptide and protein deamidation. J. Pept. Res. 2001;57:483–493. doi: 10.1034/j.1399-3011.2001.00863.x. [DOI] [PubMed] [Google Scholar]
  • 48.Capasso S. Estimation of the deamidation rate of asparagine side chains. J. Pept. Res. 2000;55:224–229. doi: 10.1034/j.1399-3011.2000.00172.x. [DOI] [PubMed] [Google Scholar]
  • 49.Capasso S., Mazzarella L., Sica F., Zagari A., Salvadori S. Kinetics and mechanism of succinimide ring formation in the deamidation process of asparagine residues. J. Chem. Soc., Perkin Trans. 1993;2:679–682. [Google Scholar]
  • 50.Tyler-Cross R., Schirch V. Effects of amino acid sequence, buffers, and ionic strength on the rate and mechanism of deamidation of asparagine residues in small peptides. J. Biol. Chem. 1991;266:22549–22556. [PubMed] [Google Scholar]
  • 51.Robinson N.E., Robinson Z.W., Robinson B.R., Robinson A.L., Robinson J.A., Robinson M.L., Robinson A.B. Structure-dependent nonenzymatic deamidation of glutaminyl and asparaginyl pentapeptides. J. Pept. Res. 2004;63:426–436. doi: 10.1111/j.1399-3011.2004.00151.x. [DOI] [PubMed] [Google Scholar]
  • 52.Vamathevan J., Clark D., Czodrowski P., Dunham I., Ferran E., Lee G., Li B., Madabhushi A., Shah P., Spitzer M., Zhao S. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019;18:463–477. doi: 10.1038/s41573-019-0024-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Dall’Acqua W.F., Kiener P.A., Wu H. Properties of human IgG1s engineered for enhanced binding to the neonatal Fc receptor (FcRn) J. Biol. Chem. 2006;281:23514–23524. doi: 10.1074/jbc.M604292200. [DOI] [PubMed] [Google Scholar]
  • 54.Silva J.P., Vetterlein O., Jose J., Peters S., Kirby H. The S228P mutation prevents in vivo and in vitro IgG4 Fab-arm exchange as demonstrated using a combination of novel quantitative immunoassays and physiological matrix preparation. J. Biol. Chem. 2015;290:5462–5469. doi: 10.1074/jbc.M114.600973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Oganesyan V., Gao C., Shirinian L., Wu H., Dall’Acqua W.F. Structural characterization of a human Fc fragment engineered for lack of effector functions. Acta Crystallogr. D Biol. Crystallogr. 2008;64:700–704. doi: 10.1107/S0907444908007877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Dimasi N., Fleming R., Zhong H., Bezabeh B., Kinneer K., Christie R.J., Fazenbaker C., Wu H., Gao C. Efficient preparation of site-specific antibody-drug conjugates using cysteine insertion. Mol. Pharm. 2017;14:1501–1516. doi: 10.1021/acs.molpharmaceut.6b00995. [DOI] [PubMed] [Google Scholar]
  • 57.Lu X., Nobrega R.P., Lynaugh H., Jain T., Barlow K., Boland T., Sivasubramanian A., Vásquez M., Xu Y. Deamidation and isomerization liability analysis of 131 clinical-stage antibodies. MAbs. 2019;11:45–57. doi: 10.1080/19420862.2018.1548233. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Kuhn M. Building predictive models in r using the caret package. J. Stat. Soft. 2008;28:1–26. [Google Scholar]
  • 59.Zhu K., Day T., Warshaviak D., Murrett C., Friesner R., Pearlman D. Antibody structure determination using a combination of homology modeling, energy-based refinement, and loop prediction. Proteins. 2014;82:1646–1655. doi: 10.1002/prot.24551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S3 and Tables S1–S3
mmc1.pdf (3.9MB, pdf)
Data S1. Training and Validation Datasets
mmc2.xls (256.5KB, xls)
Document S3. Article plus Supplemental Information
mmc3.pdf (4.6MB, pdf)

Articles from Molecular Therapy. Methods & Clinical Development are provided here courtesy of American Society of Gene & Cell Therapy

RESOURCES