Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 15.
Published in final edited form as: ACS Synth Biol. 2012 Mar 30;1(6):221–228. doi: 10.1021/sb300014t

SCHEMA Designed Variants of Human Arginase I & II Reveal Sequence Elements Important to Stability and Catalysis

Philip A Romero 1,, Everett Stone 4,, Candice Lamb 5, Lynne Chantranupong 6, Andreas Krause 2,3, Alex Miklos 7, Randall A Hughes 7, Blake Fechtel 5, Andrew D Ellington 7,8, Frances H Arnold 1,*, George Georgiou 4,5,6,7,8,*
PMCID: PMC3378063  NIHMSID: NIHMS369862  PMID: 22737599

Abstract

Arginases catalyze the divalent cation-dependent hydrolysis of L-arginine to urea and L-ornithine. There is significant interest in using arginase as a therapeutic anti-neogenic agent against L-arginine auxotrophic tumors and in enzyme replacement therapy for treating hyperargininemia. Both therapeutic applications require enzymes with sufficient stability under physiological conditions. To explore sequence elements that contribute to arginase stability we used SCHEMA-guided recombination to design a library of chimeric enzymes composed of sequence fragments from the two human isozymes Arginase I and II. We then developed a novel active learning algorithm that selects sequences from this library that are both highly informative and functional. Using high-throughput gene synthesis and our two-step active learning algorithm, we were able to rapidly create a small but highly informative set of seven enzymatically active chimeras that had an average variant distance of 40 mutations from the closest parent arginase. Within this set of sequences, linear regression was used to identify the sequence elements that contribute to the long-term stability of human arginase under physiological conditions. This approach revealed a striking correlation between the isoelectric point and the long-term stability of the enzyme to deactivation under physiological conditions.

Keywords: Enzyme engineering, arginase, homologous recombination, SCHEMA library design, active learning, protein stability


Humans produce two arginase isozymes (EC 3.5.3.1) that catalyze the hydrolysis of L-arginine (L-Arg) to urea and L-ornithine (L-Orn). The Arginase I (hArgI) gene is located on chromosome 6 (6q.23), is highly expressed in the cytosol of hepatocytes, and functions in nitrogen removal as the final step of the urea cycle. The Arginase II (hArg II) gene is found on chromosome 14 (14q.24.1). Arginase II is localized in the mitochondria in tissues such as kidney, brain, and skeletal muscle, where it is thought to provide a supply of l-ornithine (L-Orn) for L-proline and polyamine biosynthesis (1). The two enzymes share 61% amino acid sequence identity and adopt a homo-trimeric structure composed of an α/β fold consisting of a parallel eight-stranded β–sheet surrounded by several helices. These enzymes contain a di-nuclear metal cluster that generates a hydroxide for nucleophilic attack on the guanidinium carbon of L-arginine (2, 3). In eukaryotes and the vast majority of prokaryotes, the native metal cofactor in arginase is believed to be Mn2+.

There is significant interest in applying arginases as cancer chemotherapeutic agents. A number of high morbidity tumors such as hepatocellular carcinomas (HCCs), melanomas, renal cell, and prostate carcinomas (4-6) are deficient in the urea cycle enzyme argininosuccinate synthase (ASS) and thus are sensitive to l-arginine (l-Arg) depletion. Non-malignant cells typically enter into quiescence (G0) when deprived of l-Arg and remain viable for several weeks. However, ASS-deficient tumor cells experience cell cycle defects that lead to the re-initiation of DNA synthesis even though protein synthesis is inhibited, in turn resulting in major imbalances that lead to rapid cell death (7, 8). The selective toxicity of l-Arg depletion for HCC, melanoma, and other urea-cycle enzyme deficient cancer cells has been extensively demonstrated in vitro, in xenograft animal models, and in clinical trials (4, 5, 7, 9).

Additionally, rare, autosomal recessive mutations in hArgI can cause hyperargininemia, which results in hyperammonemia, spasticity, seizures, and failure to thrive (10). Dietary management in combination with oral phenylbutyrate is often successful in controlling hyperammonemia, but the underlying hyperargininemia can persist, which can result in L-Arginine-associated neurotoxicity (11). Red blood cell replacement, which provides supplemental hArgI within red blood cells, has shown promise in treating hyperargininemia as evidenced by reduced serum L-Arg levels and improved clinical outcomes (12, 13).

To function as a therapeutic agent, arginase must efficiently degrade L-Arg to very low levels (< 5 μM) under physiological conditions (~100 μM L-Arg, 37 °C, and pH 7.4). Unfortunately, hArgI and hArgII display low enzymatic activity at physiological pH and are rapidly inactivated in serum, with half-lives of only a few hours. Arnold and coworkers have demonstrated the utility of SCHEMA-guided recombination for generating libraries of chimeric proteins between low-homology sequences (14, 15). In an effort to understand the sequence determinants of arginase that are important for long-term stability we designed a SCHEMA-guided recombination library composed of sequence fragments from the human arginases hArgI and hArgII (Figure 1). By coupling this SCHEMA library with a novel active learning algorithm we efficiently identified a diverse set of enzymatically active chimeric arginases. These chimeras highlighted an important correlation between isoelectric point and long-term stability, providing a key insight into how these enzymes might be further optimized for stability.

Figure 1.

Figure 1

Overview of method. (A) Starting with the two parent arginases, we used SCHEMA (structure-guided recombination) to identify optimal recombination sites. (B) Next, a two-step active learning algorithm was used to identify a highly informative subset of this SCHEMA library. The first step of this algorithm efficiently learns which sequence elements contribute to loss-of-function. The second step then uses this information to design a set of chimeras that are highly informative and functional. (C) With experimental data on chimeric arginases, regression analysis can be used to make predictions across the entire library, or to understand how each sequence element contributes to arginase properties.

RESULTS AND DISCUSSION

SCHEMA library design

When homologous proteins are recombined, new interactions between structural fragments are often deleterious to protein function. The presence of these interactions within a chimeric protein can be estimated from the SCHEMA disruption, which counts interactions that are not observed in the parents (16). A chimera’s SCHEMA disruption is calculated from the parent sequences and a residue-residue contact map representation of the protein structure. Large combinatorial libraries of chimeric proteins can be designed using the Recombination as a Shortest-Path Problem (RASPP) algorithm, which identifies the library that minimizes the average SCHEMA disruption with constraints on the number and size of sequence fragments (17).

Human Arginase I (hArgI) and human Arginase II (hArgII) share 61% amino acid sequence identity (64% nucleotide identity) and were chosen as parents for a SCHEMA recombination library. The trimeric structure of hArgII (PDB ID: 1PQ3) was used to prepare the contact map, which included both intra- and intersubunit contacts. The RASPP algorithm was used to design a library of chimeric sequences having seven recombination sites (eight sequence blocks).

The chimera blocks chosen for the arginase recombination library are illustrated in Figure 2. Within each monomer’s central β-sheet, seven of eight strands came from different blocks, while the trimer interface was formed from blocks 5, 6, and 8. Substrate recognition in arginase is achieved by several loops that flank the active site and numerous water-mediated hydrogen bonds (18). Within the chimera library, each of these “specificity” loops was located in different blocks. We believed these design choices should provide multiple opportunities for identifying more functional catalysts, especially since the residues that coordinate the catalytic binuclear manganese cluster were conserved within the library, while the surrounding, second-shell residues came from different parental combinations of blocks 3,4,7, and 8.

Figure 2.

Figure 2

Arginase chimera library block boundaries. (A) Arginase three-dimensional structure with blocks represented by different colors. The trimer interface is shown as a transparent surface. (B) Contact map displaying residue-residue contacts that could be broken upon recombination. The colored squares correspond to the block divisions of the library. (C) Secondary structure diagram showing the chimera library block divisions.

The sequences within the designed chimera library were diverse: on average, chimeras differed from one another by 60 mutations (as few as 6 and as many as 120). These chimeras were also novel: the average mutational distance between a chimera and a parent arginase was 40.3 mutations. Nonetheless, based on results from earlier studies (14, 15) and the average SCHEMA disruption score for the designed library (⟨ E ⟩ = 16), it was predicted that approximately half of the chimeras would be functional arginases.

Rational generation of an informative set of chimeras

While the SCHEMA algorithm limits the protein sequence space that must be explored in order to identify functional variants, the problem of deciding which proteins to construct and assay is still a challenging one. For example, in the current library there were 256 (28) possible chimeric arginases, and synthesis and characterization of all these possibilities would have been daunting. Systematically chosen chimera sets are more effective than randomly chosen ones (19), but the criterion for selection are very much open to discussion. Selecting chimeras that equalize the representation of each parent at each block position will not generate a maximally informative set of proteins (15), primarily because of the significant proportion of nonfunctional sequences that provide no information about functional properties. Such nonfunctional sequences can be avoided by making single block perturbations, which better avoid major, disruptive interactions. However, these one-factor-at-a-time designs closely resemble the wild-type parents, and thus limit the sequence and functional diversity of the data sets produced (20). To balance these considerations, we developed a two-step active learning algorithm that efficiently identifies an informative set of functional chimeras by first training a model that can predict if a chimera will form a functional protein, and then using this functional status classifier to guide an experimental design.

The first step of the algorithm involved finding an informative set of chimeras for a logistic regression classifier that models the probability that a chimera will form a functional protein. Here, we quantify the ‘informativeness’ of a set of chimeras as the mutual information between that set and the remainder of the library (see Methods). Intuitively, this mutual information measures how much observing a given set of chimeras reduces the uncertainty (Shannon entropy) of prediction for the remainder of the library. Based on these criteria, we initially chose to study a set of eight arginase chimeras that maximized this mutual information criterion. The genes encoding these eight chimeras (Table 1, SCHEMA A-H) were synthesized and expressed (see Methods). As expected, approximately half (3/8) of the sequences produced functional arginases. With the functional status of these sequences now defined, it proved possible to train a Bayesian logistic regression model to predict the probability of functioning for all chimeras within the library.

Table 1.

Chimeric arginase data. Nonfunctional sequences are denoted by an asterisk. SCHEMA A-L are the designed set of highly informative sequences, SCHEMA M and N are the sequences used to validate the regression model, and SCHEMA O and P are two additional chimeras that were generated during this study.

Name Chimera Blocks AUC AUCMn Tm °C kcat/Km mM−1s−1
hArgI 11111111 2350 5326 81.0 130 ± 20
hArgII 22222222 4042 80.6 114 ± 18
SCHEMA A(*) 11112122
SCHEMA B 12122211 3664 2927 81.2 19 ± 7
SCHEMA C 11221221 3872 5838 68.1 53 ± 10
SCHEMA D(*) 12211212
SCHEMA E(*) 21121212
SCHEMA F(*) 21212211
SCHEMA G(*) 22111221
SCHEMA H 22221111 3089 4109 68.5 31 ± 7
SCHEMA I 21122121 1654 67.5 27 ± 11
SCHEMA J 11222112 4710 6188 70.7 42 ± 8
SCHEMA K 22121121 1311 2655 78.9 27 ± 10
SCHEMA L 21222111 2828 70.5 19 ± 7
SCHEMA M 12122122 4005 71.2 45 ± 11
SCHEMA N 12222112 3901 70.6 36 ± 5
SCHEMA O 11122222 1026 3108 82.5 138 ± 19
SCHEMA P 21121111 1417 74.3 39 ± 10
(*)

No protein expression detected.

The second step of the algorithm then consisted of finding a highly informative set of functional chimeric arginases. We used the predictions from the logistic regression model to select sequences that maximized the expected value of the mutual information between the chosen set and the remainder of the library (see Methods). This criterion should have simultaneously identified sequences that were both informative and had a high probability of being functional. A set of four additional chimeras was chosen that maximized the expected value of the mutual information. Significantly, when these gene sequences were synthesized and expressed they were all found to encode functional enzymes that hydrolyzed L-Arg at significant rates (Table 1, SCHEMA I-L).

Overall, the active learning algorithm efficiently identified a highly informative set of nine functional arginases (two parents and seven chimeras). Within this set of chimeric sequences, each parent at each block was typically observed multiple times, and 103 of the 112 possible sequence block pairs were observed. Some blocks (such as block 4 parent 1) were under-represented, presumably because they contributed to loss of function and were therefore avoided in the second step of the sequence selection algorithm.

Regression model for long-term stability

We used the highly informative set of chimeras to explore sequence-function relationships within the arginase library. In particular, the temporal inactivation of all nine enzymes within the designed set of sequences was measured (see Methods). Because the chimeras displayed either exponential, sigmodial, or biphasic decay of activity, for ease of comparison we derived each chimera’s normalized area under the inactivation curve (AUC), which provides a measure of a chimera’s overall kinetic stability (Table 1). A Bayesian linear regression model was used to correlate sequence fragments with the experimentally measured AUC values (see Methods). This model resulted in an excellent fit (r = 0.98, Figure 3A), and the block regression parameters are given in Table 2.

Figure 3.

Figure 3

Arginase long-term stability. (A) Bayesian linear regression model for AUC. Green and blue circles correspond to the parents and chimeras (respectively) within the initial data set (r = 0.98 and p = 9e-7). Red stars represent the model’s predictions on the validation set. (B) Correlation between isoelectric point and AUC for all chimeras tested (r = −0.74 and p = 0.004).

Table 2.

Regression model parameters. The parameters specify how substituting parent 2 for parent 1 at a given block changes the logarithm of the AUC. The most significant substitution occurs at block 3, which is highlighted in grey.

Parameter
name
log(AUC)
Reference 7.76
B1P2 −0.23
B2P2 0.00

B3P2 0.39

B4P2 0.02
B5P2 0.07
B6P2 0.32
B7P2 −0.26
B8P2 0.20

To validate the linear regression model we designed two additional chimeric arginases (SCHEMA M and N) that were predicted to have enhanced long-term stability. These sequences were synthesized and characterized. The regression model showed good predictive ability (Figure 3A), and both sequences were more stable than 80% of the other chimeric arginases.

From the regression analysis, the most stabilizing sequence element was found to be block 3, where substituting hArgI for hArgII is estimated to increase the AUC by almost 50%. Closer inspection of the amino acid sequences for this important chimera block revealed an abundance of charged residues. Consistent with this observation, we found the estimated isoelectric point (21, 22) of the chimeras to show a striking negative correlation (r = −0.74, p = 0.004) with the AUC, Figure 3B. Thus, chimeras with greatest net negative charge under the assay conditions (pH 7.4 and 37 °C) were the most stable, while those closer to their isoelectric point exhibited faster inactivation.

Metal dependence of stability

To test if metal binding affected thermal unfolding, the melting temperature (Tm) for all sequences was measured in the presence and absence of a chelator (see Methods; Table 1). The melting temperatures showed no significant correlation with long-term stability (r = −0.30, p = 0.31). However, as expected the addition of EDTA resulted in lower thermodynamic stability for all enzymes, with an average decrease in Tm values of 15 ± 5 °C, in close agreement with results of similar chelation experiments with beef liver and Saccharomyces cerevisiae arginases (23, 24). This highlights that bound Mn2+ stabilizes the correctly folded state under thermal equilibrium conditions. To determine if metal chelation is also a key factor in long-term kinetic stability at thermodynamically stable conditions, we measured stability in the presence of excess manganese (500 μM MnCl2) (AUC_Mn in Table 1) at 37 °C, far below the average Tm of 74 ± 6 °C. In accord with the denaturation data, excess manganese was shown to increase long-term stability while maintaining the overall trend in long-term stability as a function of isoelectric point. These findings indicate that the enzymes may be stabilized by excess charge as a function of pI, and that metal stabilizes the correctly folded form of the enzyme. As activity is dependent on active-site bound metal, it is clear that excess manganese will drive the equilibrium toward the metal-bound (active), folded state and delay irreversible inactivation. A possible mechanism of inactivation is depicted in Figure 4. Here, arginase is irreversibly inactivated by loss of metal followed by protein unfolding/aggregation.

Figure 4.

Figure 4

Schematic of potential arginase inactivation mechanisms. (A) loss of first equivalent of bound metal and decrease of some activity, (B) loss of second equivalent of bound metal and loss of all activity, (C) equilibrium between folded and unfolded states, (D) probably irreversible precipitation/aggregation.

For all chimeric arginases, we performed Michaelis-Menten kinetic measurements (see Methods) and calculated the resulting catalytic efficiencies (kcat/KM) (Table 1). Intriguingly, the fold stabilization upon addition of 500 μM MnCl2 (AUCMn/AUC) displayed a linear relationship with catalytic efficiency (r = 0.85, p = 0.02), Figure 5. A similar trend has been observed within a set of Cu2+ complexes (25). In that study, the authors found the stability of a Cu2+ complex to be inversely related to its rate of glycine methyl ester hydrolysis, indicating that more stable complexes lower the Lewis acidity of the Cu2+ ion. Likewise, arginases that bind Mn2+ more tightly (i.e., that are not as dependent on an excess of Mn2+ for long term activity) may have reduced Lewis acidity for coordinating substrate or water ligands, and therefore diminished catalytic efficiency.

Figure 5.

Figure 5

Correlation between fold change in long-term stability (AUCMn /AUC) and catalytic efficiency (kcat/KM), r = 0.85 and p = 0.02.

Summary and conclusions

The combination of structure-guided SCHEMA recombination and an efficient active learning procedure were used to generate a highly informative set of catalytically active chimeric arginases. Site-directed recombination libraries between low homology parental genes provide unique data sets for probing sequence-function relationships, offering distinct advantages over sets of point mutants or naturally existing proteins. The effects of point mutations are frequently too small to resolve experimentally, while the large numbers of neutral mutations in naturally existing proteins make it difficult to pinpoint the basis of functional differences. In contrast, libraries of chimeric proteins contain an intermediate level of sequence diversity, and mutational changes are observed in multiple sequence backgrounds.

The resulting set of chimeric human arginases displays measurable variation, and the sequence basis of this variation can be efficiently identified using linear regression. The high level of sequence diversity within the hArg chimeras translates into extraordinary functional diversity, as evidenced by the fact that many of the measured properties were outside the range displayed by the two parents (Table 1). For example, recombination of hArgI and hArgII (pI = 6.8 and 5.7, respectively) generated a set of functional chimeras with isoelectric points ranging from 5.5 to 7.5. A linear regression model helped identify a strong negative relationship between a chimeric arginase’s isoelectric point and its long-term stability (r = −0.74, p = 0.004). Since the long-term stability experiments were performed at physiological pH (7.4), chimeras with the greatest net charge (low pI) displayed the greatest stability. Similar relationships between a protein’s net charge and its stability has been observed previously; for example, a large survey across multiple protein families found many proteins to be less stable near their isoelectric point (26). Similarly, engineered ribonuclease variants show decreased solubility and increased aggregation near their isoelectric point (27, 28).

The relationships ferreted out in this study have practical consequences for protein engineering. Arginase inactivation is strongly linked to the loss of the metal center of arginase, as activity as well as structural/thermal stability are metal-dependent. Given the mechanism of inactivation depicted in Figure 4, it might be further hypothesized that stability issues could be resolved by: (a) Engineering proteins with increased numbers of negative surface charges (29); (b) Increasing the concentration of metal (likely therapeutically intractable); or (c) Introducing a different metal that might lead to improved binding and hence stability.

With respect to the latter hypothesis, it is noteworthy that we have recently reported that Co2+-substituted hArgI (Co-hArgI) displays a dramatically reduced KM for L-Arg relative to the native Mn+2-containing enzyme, and has a 12-fold increase in kcat/KM. More importantly, Co-hArgI is significantly more stable in serum, with an inactivation half-life of more than 30 hours (30). The improved pharmacological properties of Co-hArgI have been shown to mediate potent tumor cytoxicity against numerous cancer cell lines in vitro, and to lead to the inhibition of hepatocellular and pancreatic carcinomas in the mouse xenograft model (30, 31).

In the case of arginase replacement therapy to treat hyperargininemia, it would be preferred to reduce elevated serum l-Arg levels that range from 600-900 μM (32) to normal reference values of 50-150 μM (33), rather than completely eliminate the amino acid from the bloodstream. The ideal enzyme for this application would have exceptional long-term stability, but not necessarily the increased efficiency that the Co-substituted enzyme shows. The SCHEMA J variant (Blocks 11222112) identified in this study has a stable linear decay rate of only 1% per hour and thus may hold promise for therapeutic purposes. A simple kinetic model based on substrate hydrolysis rates, inactivation rates, and L-Arg replenishment estimates suggests that a single dose of SCHEMA J could maintain l-Arg levels in hyperargininemia patients within the normal range for five days longer than a single dose of the more active but less stable Co2+-loaded hArgI (34, 35).

Such a treatment option is especially interesting for a number of reasons. As the SCHEMA J variant is comprised of two human arginases, only the three chimeric junctions represent potential new T-cell epitopes. Using software from the Immune Epitope Database Analysis Resource (36-38), we analyzed each of these sequence junctions for any significant changes in predicted epitope binding relative to the parent sequences for the eight most common HLA alleles (see Methods). Calculations for the HLA-DRB1*15:01 allele for the second junction suggested a 3- and 3.5-fold increase in binding affinity relative to hArgI and hArgII, respectively; all other junctions and alleles did not show a significant change relative to the parental sequences, suggesting that SCHEMA J is not likely to be highly immunogenic. Moreover, since hArgI has been under investigation as an antineoplastic agent, its serum retention time has already been pharmacologically optimized via PEGylation, resulting in dose dependent l-Arg depletion in rats for up to days at a time (39), and thus methods for further extending the lifetime of the chimera may already exist.

Overall, the ability to design enzymes that are customized to specific reaction conditions is of significant interest to biomedical science. SCHEMA recombination coupled with an active learning algorithm provided a diverse and efficient sampling of the protein fitness landscape, revealing features that could not be observed by traditional biochemical methods. These data sets therefore provide a unique opportunity to explore the relationships between protein sequence and protein function, quickly yielding fundamental principles that can be used to engineer highly-optimized protein sequences.

METHODS

Active learning algorithm

The active learning algorithm consists of a two-step experimental design. The first step involves finding an informative set of chimeras for a logistic regression functional status model. Here, we would like to find the set of sequences that maximize the mutual information between the chosen set of chimeras S and the remainder of the library L\S, which is given by

I(S;L\S)=H(L\S)H(L\SS),

where H(L\S) is the Shannon entropy of library L excluding the chimeras in subset S and H(L\S|S) is the entropy of the same sequences after the chimeras in S have been observed. We approximate the intractable entropy of the Bayesian logistic regression model by replacing the logistic response with a Gaussian likelihood. With this approximation, the properties of collections of sequences and their relationships can be represented with a multivariate Gaussian distribution, and their Shannon entropy can be calculated from the determinant of the covariance matrix. Gaussian mutual information is a submodular set function (40) and therefore can be efficiently maximized using a greedy approximation algorithm (41). We used a greedy algorithm to find a set of sequences S with maximized mutual information. The functional status of the resulting sequences was then used to train a Bayesian logistic regression model that can predict the probability of functioning for all chimeras in the library.

The second step of the algorithm consists of finding a highly informative set of functional chimeric arginases. Here, we want to find the set of chimeras S which maximize the expected value of the mutual information

E[I(S;L\S)]=AP(S)[I(A;L\A)cApcc(S\A)1pc],

where the sum is over all subsets A in the power set of S and pc is the predicted probability of being functional for chimera c from the logistic regression model. This objective is chosen to simultaneously find sequences that are informative and have a high probability of being functional, similar to the most informative positive (MIP) active learning algorithm (42). Since sub-modular functions are closed under positive linear combinations, the expected value of the Gaussian mutual information is also submodular and therefore greedy maximization provides strong performance guarantees. The covariance between sequences was calculated using the chimera-block coding scheme described in the regression analysis section (below). All experimental designs were performed with the Submodular Function Optimization Matlab Toolbox (43).

Gene synthesis and cloning

Genes encoding the SCHEMA designed arginase chimeras were synthesized from oligonucleotides as described previously(44). In brief, long DNA oligonucleotides (99 bases) were synthesized in-house and assembled into two 560-base pair fragments using inside-out PCR. These primary fragments were combined without purification in a secondary overlap-extension reaction that formed the final desired 1086-base pair product. Custom software directed the assembly schemes and the efficient re-use of oligonucleotides across multiple related sequences. 32-base pair overlaps were designed between adjacent oligonucleotides and a 35-base pair overlap was designed between the two primary fragments. Genes were synthesized with an N-terminal 6x His tag followed by a tobacco etch virus protease cleavage site and NcoI and EcoRI restriction sites as described previously (30). These genes were cloned into a pET28a expression vector and the sequences were verified using DNA sequencing.

Two variants (SCHEMA O and SCHEMA P) were not designed by the algorithm, but were chosen from preliminary experiments based upon regions of sequence homology. These chimeras were constructed by overlap extension PCR and are included in this study as they contain SCHEMA identified blocks from hArgI and hArgII.

Expression and Purification

E.coli cells expressing arginase variants were grown at 37 °C in minimal media to an OD600 of 0.8–1. Cells were collected by centrifugation, re-suspended in fresh minimal media containing 0.5 mM IPTG and 100 μM MnSO4, and incubated for an additional 8–12 hours at 37 °C with shaking. After protein expression, cells were collected by centrifugation, lysed using a French pressure cell, and centrifuged at 14,000xg for 20 min at 4 °C. The clarified cell lysate was applied to a nickel IMAC column, washed with 10-20 column volumes of IMAC buffer and the purified arginases were eluted with IMAC elution buffer (50 mM NaPO4, 250 mM imidazole, 300 mM NaCl, pH 8). The purified arginases were buffer exchanged several times into PBS, 10 % glycerol, pH 7.4 using a 10,000 MWCO centrifugal filter device (Amicon). Aliquots of purified arginase variants were then flash frozen in liquid nitrogen and stored at −80 °C.

Enzyme Kinetics

Michaelis-Menten kinetics for L-Arg hydrolysis were determined in 100 mM HEPES buffer at 37 °C, pH 7.4 as previously described (30).

Long-term stability

The long-term stability of the arginase chimeras was measured in 100 mM HEPES buffer, pH 7.4 at 37°C, with or without 500 μM MnCl2. Proteins were diluted to 2 μM with 100 mM HEPES, pH 7.4 and placed at 37°C. Aliquots of 30 - 50 μL were taken at different time points (typically t = 0, 0.5, 3, 24, 48, and 72 hours). The activity at each time point was immediately measured using 1 mM L-Arg, as described previously (30). The data were plotted as percent activity as a function of time, and the area under this inactivation curve (AUC) was calculated using Kaleidagraph. The data was also fit to various models to calculate the rates of decay of activity over time: (i) for biphasic decay: %Act=(((100%amp%)ekt)+amp%)(1+e(hs(T0.5t))) where t = time, amp = amplitude of the first decay, k = the rate of exponential decay, hs= hill slope, and T0.5 = the half-life of the sigmoidal decay; for sigmoidal decay: %Act=100%(1+e(hs(T0.5t))) and finally a single exponential decay model was used for some enzymes as described in the results section.

Thermal stability

Arginase variants (20-40 μM) in PBS, pH 7.4 with or without EDTA (10 mM final concentration) were incubated in 96-well low-profile PCR plates (Fisher Scientific, Rockford, IL) on ice for 30 min. SYPRO orange dye (Life Technologies, Grand Island NY) was added into each well immediately before placing the plate in an RT (real-time)–PCR machine (LightCycler 480, Roche, Mannheim Germany). The temperature dependence of protein unfolding between 20-95 °C was measured in at least duplicate experiments. TM values were derived from the monophasic melting curves curves. To determine the circular dichroic spectra, a 6 μM sample of hArgII in a 100 mM phosphate buffer, pH 7.4 was analyzed on a Jasco J-815 CD spectropolarimeter. The change in molar ellipticity at 222 nm (θ222) was monitored from 25 to 90 °C. The fraction of denatured protein at each temperature was calculated by the ratio of [θ222]/[θ222]d where [θ222]d is the molar ellipticity of the completely unfolded protein. The resulting data were fit to a modified logistic equation to determine the thermal transition midpoint.

Regression analysis

For regression models, the independent variable corresponded to chimera sequences and is represented with a binary vector x, where xi indicates the parent identity at block i. Because of limited our data, we used Bayesian parameter estimation, which outperforms maximum likelihood estimation for small data sets.

A chimera’s binary functional status was modeled with a Bayesian logistic regression model, which contains a Bernoulli likelihood function and a zero-mean, isotropic Gaussian prior on coefficients (45). The resulting posterior distribution was approximated using Laplace’s method and prior variance was estimated from the data by maximizing the marginal likelihood function. Using Newton’s method, we found the maximum a posteriori (MAP) estimates for each chimera block’s contribution to functionality. The probability that a chimera is functional was estimated by applying the MAP parameter estimates to the logistic model.

The logarithm of a chimera’s long-term stability (AUC) was modeled with a Bayesian linear regression model, which consists of a Gaussian likelihood function with a zero-mean, isotropic Gaussian prior on coefficients (45). The measurement noise and prior variance were estimated from the data by maximizing the marginal likelihood function. With these hyperparameters, MAP estimates for each chimera block’s contribution to long-term stability were found in closed-form.

Immunogenicity Calculations

We used software from the Immune Epitope Database (IEDB) (consensus method for MHC(II) binding) (46) to evaluate peptides spanning 15 residues on either side of the hArgI and hArgII junctions of the SCHEMA J variant (Blocks 11222112) to compare with the corresponding sequences from the hArgI and hArgII parents. Using the predicted binding constants for the 8 most common HLA alleles as reported previously (47) we then calculated the ratio of the predicted binding values for each (hArgI/SCHEMA J and hArgII/SCHEMA J) peptide for each HLA allele to assess any significant changes relative to both parents.

ACKNOWLEDGMENTS

This project was supported by grants (#HF0032) and (F-1654) from TI3D/Welch Foundation & National Institutes of Health (CA 139059). In addition, this work was supported by the National Security Science and Engineering Faculty Fellowship (FA9550-10-1-0169), and L.C. was supported by a fellowship from the Arnold & Mabel Beckman Foundation. The authors also acknowledge the National Institutes of Health, ARRA (grant R01-GM068664 to FHA) for funding SCHEMA library design, and the U.S. Army Research Office, Institute for Collaborative Biotechnologies (grant W911NF-09-D-0001 to FHA) for funding the regression analysis work. These contents are solely the responsibility of the authors and do not necessarily represent the official views of the sponsors.

Abbreviations

L-Arg

L-arginine

L-Orn

L-ornithine

hArgI

human Arginase I

hArgII

human Arginase II

Tm

Melting temperature

pI

isoelectric point

AUC

area under the curve

References

  • 1.López V, Alarcón R, Orellana MS, Enríquez P, Uribe E, Martínez J, Carvajal N. Insights into the interaction of human arginase II with substrate and manganese ions by site-directed mutagenesis and kinetic studies. Alteration of substrate specificity by replacement of Asn149 with Asp. The FEBS journal. 2005;272:4540–4548. doi: 10.1111/j.1742-4658.2005.04874.x. [DOI] [PubMed] [Google Scholar]
  • 2.Cama E, Emig FA, Ash DE, Christianson DW. Structural and functional importance of firstshell metal ligands in the binuclear manganese cluster of arginase I. Biochemistry. 2003;42:7748–7758. doi: 10.1021/bi030074y. [DOI] [PubMed] [Google Scholar]
  • 3.Dowling DP, Di Costanzo L, Gennadios HA, Christianson DW. Evolution of the arginase fold and functional diversity. Cellular and Molecular Life Sciences. 2008;65:2039–2055. doi: 10.1007/s00018-008-7554-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ensor CM, Holtsberg FW, Bomalaski JS, Clark MA. Pegylated arginine deiminase (ADI-SS PEG20,000 mw) inhibits human melanomas and hepatocellular carcinomas in vitro and in vivo. Cancer Research. 2002;62:5443–5450. [PubMed] [Google Scholar]
  • 5.Feun LG, Marini A, Landy H, Markoe A, Heros D, Robles C, Herrera C, Savaraj N. Clinical trial of CPT-11 and VM-26/VP-16 for patients with recurrent malignant brain tumors. Journal of Neuro-Oncology. 2007;82:177–181. doi: 10.1007/s11060-006-9261-7. [DOI] [PubMed] [Google Scholar]
  • 6.Yoon C-Y, Shim Y-J, Kim E-H, Lee J-H, Won N-H, Kim J-H, Park I-S, Yoon D-K, Min B-H. Renal cell carcinoma does not express argininosuccinate synthetase and is highly sensitive to arginine deprivation via arginine deiminase. International journal of cancer. 2007;120:897–905. doi: 10.1002/ijc.22322. [DOI] [PubMed] [Google Scholar]
  • 7.Shen L-J, Beloussow K, Shen W-C. Modulation of arginine metabolic pathways as the potential anti-tumor mechanism of recombinant arginine deiminase. Cancer Letters. 2006;231:30–35. doi: 10.1016/j.canlet.2005.01.007. [DOI] [PubMed] [Google Scholar]
  • 8.Scott L, Lamb J, Smith S, Wheatley DN. Single amino acid (arginine) deprivation: rapid and selective death of cultured transformed and malignant cells. British Journal of Cancer. 2000;83:800–810. doi: 10.1054/bjoc.2000.1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ascierto PA, Scala S, Castello G, Daponte A, Simeone E, Ottaiano A, Beneduce G, De Rosa V, Izzo F, Melucci MT, Ensor CM, Prestayko AW, Holtsberg FW, Bomalaski JS, Clark MA, Savaraj N, Feun LG, Logan TF. Pegylated arginine deiminase treatment of patients with metastatic melanoma: results from phase I and II studies. Journal of Clinical Oncology. 2005;23:7660–7668. doi: 10.1200/JCO.2005.02.0933. [DOI] [PubMed] [Google Scholar]
  • 10.Jain-Ghaia S, Nagamanic S. C. Sreenath, Blasera S, Siriwardenaa K, Feigenbaum A. Arginase I deficiency: Severe infantile presentation with hyperammonemia: More common than reported? Molecular Genetics and Metabolism. 2011;104:107–111. doi: 10.1016/j.ymgme.2011.06.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Segawa Y, Matsufuji M, Itokazu N, Utsunomiya H, Watanabe Y, Yoshino M, Takashima S. A long-term survival case of arginase deficiency with severe multicystic white matter and compound mutations. Brain & Development. 2011;33:45–48. doi: 10.1016/j.braindev.2010.03.001. [DOI] [PubMed] [Google Scholar]
  • 12.Sakiyama T, Nakabayashi H, Shimizu H, Kondo W, Kodama S, Kitagawa T. A Successful Trial of Enzyme Replacement Therapy in a Case of Argininemia. The Tohoku Journal of Experimental Medicine. 1984;142:239–248. doi: 10.1620/tjem.142.239. [DOI] [PubMed] [Google Scholar]
  • 13.Mizutani N, Hatakawa C, Maehara M, Watanbe K. Enzyme Replacement Therapy in a Patient with Hyperargininemia. The Tohoku Journal of Experimental Medicine. 1987;151:301–307. doi: 10.1620/tjem.151.301. [DOI] [PubMed] [Google Scholar]
  • 14.Otey CR, Landwehr M, Endelman JB, Hiraga K, Bloom JD, Arnold FH. Structure-Guided Recombination Creates an Artificial Family of Cytochromes P450. PLoS Biology. 2006;4:e112. doi: 10.1371/journal.pbio.0040112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Heinzelman P, Snow CD, Wu I, Nguyen C, Villalobos A, Govindarajan S, Minshull J, Arnold FH. A family of thermostable fungal cellulases created by structure-guided recombination. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:5610–5615. doi: 10.1073/pnas.0901417106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Voigt CA, Martinez C, Wang Z-G, Mayo SL, Arnold FH. Protein building blocks preserved by recombination. Nature Structural Biology. 2002;9:553–558. doi: 10.1038/nsb805. [DOI] [PubMed] [Google Scholar]
  • 17.Endelman JB, Silberg JJ, Wang Z-G, Arnold FH. Site-directed protein recombination as a shortest-path problem. Protein engineering design selection. 2004;17:589–594. doi: 10.1093/protein/gzh067. [DOI] [PubMed] [Google Scholar]
  • 18.Shishova EY, Di Costanzo L, Emig FA, Ash DE, Christianson DW. Probing the specificity determinants of amino acid recognition by arginase. Biochemistry. 2009;48:121–131. doi: 10.1021/bi801911v. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Li Y, Drummond DA, Sawayama AM, Snow CD, Bloom JD, Arnold FH. A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments. Nature Biotechnology. 2007;25:1051–1056. doi: 10.1038/nbt1333. [DOI] [PubMed] [Google Scholar]
  • 20.Heinzelman P, Komor R, Kannan A, Romero PA, Yu X, Mohler S, Snow CD, Arnold FH. Efficient screening of fungal cellobiohydrolase class I enzymes for thermostabilizing sequence blocks by SCHEMA structure-guided recombination. Protein engineering design selection. 2010;23:871–880. doi: 10.1093/protein/gzq063. [DOI] [PubMed] [Google Scholar]
  • 21.Nelson DL, Cox MM. Lehninger Principles of Biochemistry. 4th ed Vol. 1. W. H. Freeman; 2005. [Google Scholar]
  • 22.Subramaniam S. The biology workbench - A seamless database and analysis environment for the biologist. Proteins. 1998;32:1–2. [PubMed] [Google Scholar]
  • 23.ROSSI V, GRANDI C, DALZOPPO D, FONTANA A. Spectroscopic study on the structure and stability of beef liver arginase. International Journal of Peptide and Protein Research. 1983;22:239–250. doi: 10.1111/j.1399-3011.1983.tb02091.x. [DOI] [PubMed] [Google Scholar]
  • 24.Green S, Ginsburg A, Lewis M, Hensley P. Roles of metal ions in the maintenance of the tertiary and quaternary structure of arginase from Saccharomyces cerevisiae. Journal of Biological Chemistry. 1991;266:21474. [PubMed] [Google Scholar]
  • 25.Nakon R, Rechani PR, Angelici RJ. Copper(II) complex catalysis of amino acid ester hydrolysis. A correlation with complex stability. Journal of the American Chemical Society. 1974;96:2117–2120. doi: 10.1021/ja00814a021. [DOI] [PubMed] [Google Scholar]
  • 26.Alexov E. Numerical calculations of the pH of maximal protein stability. European Journal of Biochemistry. 2003;271:173–185. doi: 10.1046/j.1432-1033.2003.03917.x. [DOI] [PubMed] [Google Scholar]
  • 27.Shaw KL, Grimsley GR, Yakovlev GI, Makarov AA, Pace CN. The effect of net charge on the solubility, activity, and stability of ribonuclease Sa. Protein Science. 2001;10:1206–1215. doi: 10.1110/ps.440101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Schmittschmitt JP, Scholtz JM. The role of protein stability, solubility, and net charge in amyloid fibril formation. Protein Science. 2003;12:2374–2378. doi: 10.1110/ps.03152903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lawrence MS, Phillips KJ, Liu DR. Supercharging proteins can impart unusual resilience. Journal of the American Chemical Society. 2007;129:10110–10112. doi: 10.1021/ja071641y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Stone EM, Glazer ES, Chantranupong L, Cherukuri P, Breece RM, Tierney DL, Curley SA, Iverson BL, Georgiou G. Replacing Mn(2+) with Co(2+) in human arginase I enhances cytotoxicity toward l-arginine auxotrophic cancer cell lines. ACS Chemical Biology. 2010;5:333–342. doi: 10.1021/cb900267j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Glazer ES, Stone EM, Zhu C, Massey KL, Hamir AN, Curley SA. Bioengineered human arginase I with enhanced activity and stability controls hepatocellular and pancreatic carcinoma xenografts. Translational oncology. 2011;4:138–146. doi: 10.1593/tlo.10265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Crombez EA, Cederbaum SD. Hyperargininemia due to liver arginase deficiency. Molecular Genetics and Metabolism. 2005;84:243–251. doi: 10.1016/j.ymgme.2004.11.004. [DOI] [PubMed] [Google Scholar]
  • 33.Rodriguez PC, Ochoa AC. Arginine regulation by myeloid derived suppressor cells and tolerance in cancer: mechanisms and therapeutic perspectives. Immunological Reviews. 2008;222:180–191. doi: 10.1111/j.1600-065X.2008.00608.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Stone E, Glazer ES, Chantranupong L, Cherukuri P, Breece RM, Tierney DL, Curley SA, Iverson BL, Georgiou G. Replacing Mn 2+ with Co 2+ in Human Arginase I Enhances Cytotoxicity Towards LArginine Auxotrophic Cancer Cell Lines. ACS Chemical Biology. 2010 doi: 10.1021/cb900267j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Stone E, Chantranupong L, Gonzalez C, O’Neal J, Rani M, VanDenBerg C, Georgiou G. Strategies for Optimizing the Serum Persistence of Engineered Human Arginase I for Cancer Therapy. Journal of Controlled Release. 2011 doi: 10.1016/j.jconrel.2011.09.097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothé BR, Chisari FV, Watkins DI, Sette A. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005;57:304–314. doi: 10.1007/s00251-005-0798-y. [DOI] [PubMed] [Google Scholar]
  • 37.Nielsen M, Lundegaard C, Lund O. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics. 2007;8:238. doi: 10.1186/1471-2105-8-238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, Braxenthaler M, Gallazzi F, Protti MP, Sinigaglia F. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nature biotechnology. 1999;17:555–561. doi: 10.1038/9858. [DOI] [PubMed] [Google Scholar]
  • 39.Cheng PN-M, Lam T-L, Lam W-M, Tsui S-M, Cheng AW-M, Lo W-H, Leung Y-C. Pegylated recombinant human arginase (rhArg-peg5,000mw) inhibits the in vitro and in vivo proliferation of human hepatocellular carcinoma through arginine depletion. Cancer Research. 2007;67:309–317. doi: 10.1158/0008-5472.CAN-06-1945. [DOI] [PubMed] [Google Scholar]
  • 40.Krause A, Guestrin C. Near-optimal Observation Selection using Submodular Functions. National Conference on Artificial Intelligence, Nectar track. 2007;22:1650–1654. [Google Scholar]
  • 41.Nemhauser GL, Wolsey LA, Fisher ML. An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming. 1978;14:265–294. [Google Scholar]
  • 42.Danziger SA, Baronio R, Ho L, Hall L, Salmon K, Hatfield GW, Kaiser P, Lathrop RH. Predicting positive p53 cancer rescue regions using Most Informative Positive (MIP) active learning. PLoS computational biology. 2009;5:e1000498. doi: 10.1371/journal.pcbi.1000498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Krause A. SFO: A Toolbox for Submodular Function Optimization. Journal of Machine Learning Research. 2010;11:1141–1144. [Google Scholar]
  • 44.Cox JC, Lape J, Sayed MA, Hellinga HW. Protein fabrication automation. Protein Science. 2007;16:379–390. doi: 10.1110/ps.062591607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Bishop CM. Pattern Recognition and Machine Learning. 1st ed Springer; New York: 2006. [Google Scholar]
  • 46.Wang P, Sidney J, Dow C, Mothé B, Sette A, Peters B. A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput Biol. 2008;4:e1000048. doi: 10.1371/journal.pcbi.1000048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Cantor JR, Yoo TH, Dixit A, Iverson BL, Forsthuber TG, Georgiou G. Therapeutic enzyme deimmunization by combinatorial T-cell epitope removal using neutral drift. Proc Natl Acad Sci U S A. 2011;108:1272–1277. doi: 10.1073/pnas.1014739108. [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES