Summary
In protein design, the energy associated with a huge number of sequence-conformer perturbations has to be routinely estimated. Hence, enhancing the throughput and accuracy of these energy calculations can profoundly improve design success rates and enable tackling more complex design problems. In this work, we explore the possibility of tensorizing the energy calculations and apply them in a protein design framework. We use this framework to design enhanced proteins with anti-cancer and radio-tracing functions. Particularly, we designed multispecific binders against ligands of the epidermal growth factor receptor (EGFR), where the tested design could inhibit EGFR activity in vitro and in vivo. We also used this method to design high-affinity Cu2+ binders that were stable in serum and could be readily loaded with copper-64 radionuclide. The resulting molecules show superior functional properties for their respective applications and demonstrate the generalizable potential of the described protein design approach.
Keywords: protein design, energy calculation, discrete rotamer sampling, EGFR inhibitor, copper binder
Graphical abstract

Highlights
-
•
Tensorized energy calculations accelerate protein design
-
•
A rotamer library can cover residues and ligands not present in structure databases
-
•
The design framework does not rely on any training data
-
•
This framework is applicable to diverse protein design problems
Motivation
Calculating the interaction energies between atoms is a central process in biomolecular simulations. Traditionally, these calculations are performed exhaustively for each atom pair, which constitutes the computational bottleneck. In this study, we introduce a framework that instead represents the dense atomic interaction fields as three-dimensional projections. These projections can condense energy evaluations into a single matrix operation, greatly simplifying the computational load. We apply this framework to the complex protein design problem in order to identify a low-energy amino acid sequence for a target structure.
Maksymenko et al. present an approach for protein design that accelerates interaction energy calculations, generates a rotamer library relying only on force-field parameters, and offers a training-free scoring function. The authors apply this framework to design proteins with EGFR-inhibiting and radio-tracing functions.
Introduction
Protein design processes search for sequences to fill up a given target structure while minimizing the free energy of this defined configuration. Under the layout of fixed-backbone design, amino acid side chains and conformations are sampled at the designable positions and scored for their energy within their local environment. Thus, protein design simulations typically sample a large number of sequence-conformer combinations even for a small number of designable positions. Moreover, the computational load increases steeply with the difficulty of the specific design problem.1 This demands scoring functions to be sufficiently fast to cover large sequence sub-spaces that contain viable solutions.2 The inherent trade-off between the scoring speed and the accuracy has led to the broad utility of fast energy functions and trained or knowledge-based models. Previous efforts include the use of knowledge-based scoring terms,3 coarse-grained representation,4 geometric filters,5 and directly6 or indirectly7 learned sequence-structure relationships.
In this work, we explore the feasibility of tensorizing energy calculations for molecular mechanics applications and, particularly, evaluate its usefulness for protein design simulations. In protein design, the evaluation of the energy associated with non-bonded interactions represents the computational bottleneck. We seek to demonstrate the accessible performance gains from reformulating the non-bonded energy terms (i.e., the Lennard-Jones [LJ], electrostatic, and solvation energy terms) to best suit the computing hardware architecture. Specifically, conducting energy calculations as large-matrix (or tensor) operations enables substantial efficiency gains on both conventional central processing units (CPUs) and massively parallel processing hardware (Figure 1A). Here we use an energy function that is readily derived from established, self-consistent force fields. This obviates the need for empirically optimizing a scoring function against one or more training datasets and thus avoids overfitting and training bias. Our approach also attributes inaccuracies directly to the force-field parameters, allowing improvements to be more systematic. Finally, as tensorization increases the throughput of evaluating sequence:conformer combinations, it raises the probability of finding lower-energy solutions, which can improve the experimental success rate.
Figure 1.
The concept of the tensorized design framework
(A) The non-bonded interactions between an amino acid and its molecular environment (e.g., another proximal amino acid) entail the calculation of all atom-atom pairwise potential energies within a distance cutoff. The intensive execution of a large number of distance- and energy-evaluation instructions, as well as memory handling processes, slows the overall performance. Thus, formulating the energy evaluation problem between two groups of atoms as a single tensor operation not only speeds up the scoring on conventional processors, but also renders the calculation highly compatible with stream processors. Moreover, the constant dimensions of the used tensors enable ideal load balancing on high-performance computers. The performance enhancement figures were calculated for the Lennard-Jones potential function, a cubic tensor representation of 22 Å side and 0.5 × 0.5 × 0.5 Å voxels. The computing operations are performed as in (B) and (C).
(B) Once a mutable or repackable target residue is defined, the input structure is transformed to the frame of reference with respect to the residue’s backbone coordinates. The side-chain atoms of the target residue are then deleted, leaving behind the “environment” atoms.
(C) Proxy values for the atoms’ positions, partial charges (illustrated here as plane projections; yellow, negative; blue, positive) or their surface solvation energies are projected onto the voxels of a constructed tensor. The rotamer library comprises the more expensive, pre-computed smooth interaction fields (i.e., field tensors; example shown for an asparagine side chain), which through a single element-wise multiplication with the environment tensor yields the spatially resolved energy in a “two-body” format.
As an overview, our design workflow starts by pre-computing a discrete rotamer library from molecular dynamics (MD) simulations of isolated amino acids. These simulations are run under the same force field from which the scoring function terms are derived. The resulting conformational pool of each amino acid is then partitioned into clusters with a constant number of rotamers. Each rotamer in the database is dually represented by its atomic coordinates and by its interaction fields projections (field tensors). To evaluate the interaction energy between an inbound rotamer and a host structure, the existing side chain at the designable position is removed (Figure 1B), and the surrounding environment is projected as a three-dimensional histogram of the constituting atoms (environment tensor). The element-wise multiplication of a rotamer’s field tensor and a structure’s environment tensor yields the interaction energy between them (Figure 1C).
We use this framework for designing two different classes of proteins. The first class aims at inhibiting soluble growth factors, particularly, the epidermal growth factor (EGF)-like ligands. Multispecific quenchers of the several soluble ligands of the EGF receptor (EGFR) are needed for broad inhibition of EGFR signaling and can serve as a new therapeutic modality for several types of cancer.8 In this work, we redesigned a minimal receptor domain to maximally stabilize it in its ligand-bound form. This yielded small, single-domain binders (18 kDa) that are biophysically and functionally superior to the natural template. The designs showed multispecific binding to their target growth factors and potently inhibited EGFR signaling in vitro and in vivo. Our second objective is to develop a new class of protein-based radiotracers for use as genetically encodable molecules for positron-emission tomography (PET) imaging. Using our framework, we could redesign a natural copper-storage protein into a developable form as a diagnostic protein tag. Two designs were monomeric and showed superior stability and high production yield, in contrast to the starting template. These designs were capable of binding Cu2+ with high affinity and low off rates, and, importantly, were sufficiently stable for hours in serum in vitro, highlighting their translational potential.
Results and discussion
Rotamer library construction
Typically, discrete rotamer libraries are constructed from amino acid conformations pooled from known structures. As the structural databases grew in size, more stringent inclusion criteria were imposed, greatly improving the quality of the available libraries.9 Nonetheless, PDB-based rotamer libraries still provide a sparse coverage of the rotameric space and entrain undesirable factors pertinent to protein structure determination, such as cryogenic measurement condition or ensemble averaging. Thus, we use MD to achieve more extensive sampling of the conformational tendencies of amino acids in folded proteins.10 Furthermore, MD simulations of isolated amino acids can cover a broader conformational distribution that is unbiased by the choice of input protein structures. Such conformational distributions more faithfully reproduce the tendencies of the random coil state and represent a reference energy distribution prior to any folding event.11 We have therefore chosen to build the rotamer library using MD simulations of capped amino acids (i.e., Ac-X-NHMe).12 The internal energies related to backbone conformational preferences were derived from the occupancy of backbone bins of the dihedral space (i.e., () angles). The side-chain conformational preferences were nonetheless mapped in the Cartesian space by means of root-mean-square deviation (RMSD) clustering after alignment to a (Cβ-Cα-N) frame of reference. This was sought to obtain a constant number of rotamer clusters for every () bin, regardless of the number of atoms or conformational tendencies of the amino acid. Each cluster would be represented by a single rotamer in the final library, where the relative energy of the rotamer relates to its respective cluster size (STAR Methods).
This approach brings several advantages. First, the internal energies derived from the conformational preferences of amino acids are consistent with the other non-bonded energy terms used in design calculations, as both rely on the same force field. Second, the generated rotamers can implicitly encode the dynamic influences on bond angles, bond stretching, and improper dihedrals. These subtle deformations are generally dismissed in other rotamer libraries, but have been shown to significantly impact the energy gap between native and non-native states of a protein.13 Third, this approach offers great versatility regarding the energy function of choice. Whereas here we used the CHARMM force field14 for both the rotamer library creation and the design energy function, more complex potentials (e.g., polarizable force fields) can be deployed. The library can be readily extended to cover rotameric distributions in different pH or sequence (e.g., tri- or pentapeptide) contexts and can generate on-demand rotamers for non-standard amino acids or ligands in a consistent way.15 Fourth, the long MD trajectories effectively sample a broader space compared with what is traditionally obtained from the PDB-derived rotamer libraries16 (Figure S1A). Furthermore, by raising the temperature of the isolated amino acid MD simulations, broader coverage of otherwise poorly sampled () regions can be covered more effectively than in PDB-based rotamer libraries (Figure S1B). The former can be particularly successful in better accessing rare “linchpin” rotamers reported to constitute sampling bottlenecks.17 Last, the extraction of a constant number of rotamers plays an important role downstream of the algorithm, as it dictates a defined rotameric sampling granularity. It also guarantees a uniform load balancing during the design calculations, particularly in parallelized implementations.
Tensorized molecular interaction fields, energies, and mechanics
Our framework combines two principles to maximize the computational efficiency of energy calculations. The first principle is to pre-compute and store most of the information needed for energy calculation. The second principle is to deploy a tensorized form of the energy functions to better fit a single-instruction, multiple data processing paradigm, a hallmark of modern computing technology (Figure 1A).
In this framework, the scoring problem is simplified into a multiatom, two-body problem, where the 1st body represents the multiatom chemical environment surrounding the side chain at the designable position, wherein the atoms of this side chain are absent. The 2nd body represents the inbound side-chain rotamer, aligned to the same frame of reference. The information on this two-body interaction is encoded in an asymmetric fashion in which the 1st body only encodes the three-dimensional occupancy of its atomic positions and charges, while the 2nd body encodes the net, real-valued energy field around all of its respective atoms (Figure 1C). The computationally expensive step is the projection of energy fields, which in this case is restricted to the 2nd body (i.e., the rotamer) and is hence pre-computed once and stored in a look-up table. This restricts the run-time computing load to simply mapping the 1st-body three-dimensional occupancy, substantially reducing the run time needed at every designable position. Such a representation benefits from further speed-up when implemented in a tensorized fashion. In this manner, the scalar-valued interaction energy between the two bodies is obtained by the sum of the element-wise product of the two tensors (representing environment atoms and the rotamer energy field). This format is ideally suited for evaluating both the LJ and the electrostatic potentials, albeit at the cost of assuming symmetric LJ parameters for the interacting atom pairs, according to the atom’s respective encoding in the 2nd body tensor. Nonetheless, an atom-type rescaling of atoms in the 1st body can be used to correct for this in the future. Applying the tensorization framework would differ, however, in the case of encoding a surface area-based solvation potential. In the latter situation, the 1st body tensor has to fully describe the environment’s surface solvation energy field. Hence, the solvation energy per unit surface area will be normalized by the number of voxels representing the atomic surface (STAR Methods). Such a tensor is pre-computed for the 2nd body (i.e., the inbound side-chain rotamer) and is computed on the fly for the 1st body as well as for the combined two bodies (STAR Methods). This renders the solvation term the relatively more expensive energy term to compute.
To assess the accuracy of this energy function retrospectively, we predict the energy change associated with single-point mutations. To avoid the confounding effects of combinatorial repacking and iterative energy minimization, the energy values were evaluated without any combinatorial side-chain optimization, but by finding the lowest-energy rotamer at the designated position only (STAR Methods). We used the dataset of mutants of the β1 domain of streptococcal protein G, constituting the largest thermodynamic stability dataset collected in a single experimental setup to date.18 This dataset covers most of the single-point mutagenesis landscape of the Gβ1 protein and represents a broad range of burial and secondary structure contexts. In this setup, values obtained by the tensorized potential (herein referred to as the Damietta potential) showed a better correlation () compared with the reported Rosetta score18 () (Figure S2), where both methods were used without minimization. By testing our potential against other large datasets generated from proteolytic stability assays,19 we obtained correlation coefficients ranging between 0.26 and 0.41. These datasets comprise diverse folds of the N-terminal domain of the phage 434 repressor (1,046 mutants; ), the SH3 domain of human obscurin (1,097 mutants, ), the N-terminal domain of ribosomal protein 493 L9 (725 mutants, ), and r11_829_TrROS protein designed by trRosetta (833 mutants, ) (Figure S2). We also sought to evaluate the native side-chain conformer recovery for a dataset of proteins with available X-ray and NMR structures.6 The overall recovery rates obtained by the Damietta potential were around 70% for χ1 and 50% for χ1&2 (Figure S3A). As expected, buried residues had a higher prediction accuracy (∼90% for χ1 and 65%–85% for χ1&2; Figure S3B), given the constraining chemical environment around the amino acids at the protein core. We further used the same dataset of X-ray structures to evaluate native sequence recovery, where the results showed very low recovery rates compared with other design methods (Figure S3C). This can be attributed to the fact that our potential was not trained to maximize sequence nativeness, which is not necessarily a proxy of sequence optimality.
Combinatorial design by decision tree swarm
The highly dimensional nature of combinatorial design of more than a few amino acid positions severely limits the usefulness of exact sampling algorithms in finding global minimum solutions within reasonable computing time frames. Nonetheless, despite the non-additive effects of correlated mutations, favorable mutations generally tend to cluster closely in the sequence space.20 Thus, a swarm of greedy samplers traversing several sequence optimization paths simultaneously can generally reduce entrapment within local minima and lead to near-optimal solutions. This is clearly demonstrated by the success of stochastic design algorithms.21 In order to enable the exploration of a sizable number of mutations, we developed a combinatorial sampling strategy that searches for successive minima by spanning a few semi-independent search paths. This strategy builds on the power of parallel, loosely communicating conformational samplers that assume a globally smooth, but locally rugged, landscape as was demonstrated with the SARS22 and FLAPS23 algorithms. One way of implementing this approach in a design context is through a “few-to-many, many-to-few” scheme, whereby designable amino acid positions are arranged as depth levels within a decision tree, and nodes at each level represent the mutational decisions. As branching represents the expansion to many mutant combinations, a ranking-and-trimming step that keeps only a few lowest-energy designs is imposed between the layers of the tree (Figure S4). This alternation between the many combinations and the few best intermediary decoys guarantees the traversal of a pre-set number of branches (n_paths) down the tree depth at any given level. This scheme restrains the combinational load complexity and enables several parallelization schemes. Measuring the overall performance of the algorithm against varying design loads shows the algorithm to greatly simplify the sampling complexity, while enabling an arbitrary level of parallel sampling through the number of paths (n_paths). These paths not only generate diverse output from a single run, but also keep track of several local minima, which improves the overall minimization outcome when the sampling complexity grows. Particularly, under the same sampling complexity, minimizing across a larger number of paths leads to lower energy decoys (Table S1). These results also highlight a rotamer sampling performance in the microsecond range, which, compared with other methods, indicates substantial performance gains (Table S1). We expect further performance optimization to greatly improve these figures in the future.
In most stochastic design algorithms, higher energy mutations can still be accepted at a lower probability in favor of basin hopping and diversity generation. This, however, adds random noise to the already heterogeneous uncertainty of the scoring function (i.e., scoring error). Instead, here we introduce diversity by randomizing the ordering of the designable positions along the decision tree (i.e., across independent design simulation replicas). In this setup, the parallel, deterministic sampling trials are more dispersed across the solution space and their optimization paths can be easily retraced, in comparison with the use of a Metropolis criterion. This “few-to-many-to-few” combinatorial sampler is thus best run through several independent replicas (while randomizing the order of designable positions) with iterated traversals of the same decision tree () in order to improve the search convergence within each simulation replica (Figure S4).
Design and characterization of EGFR inhibitors
As a proof-of-principle, we applied our framework to create inhibitors of EGFR signaling, a key pathway involved in the survival, proliferation, and dissemination of tumor cells.24 EGFR (HER1) is a receptor tyrosine kinase that represents an important target for modulating signal transduction cascades, as it dimerizes upon ligand-induced conformational change.25 Approved inhibitors of EGFR signaling are either small-molecule inhibitors of the receptor’s intracellular kinase domain or monoclonal antibodies blocking its ectodomain dimerization.26 These are indicated for treating different EGFR-dependent cancers, e.g., colon cancer and epidermoid carcinomas.27 Nonetheless, these two inhibition modalities (i.e., tyrosine kinase inhibitors and dimerization-inhibiting monoclonal antibodies) have been shown to be subject to evasion by cancer cells through numerous evolution and resistance mechanisms.26 Binders targeting the ligand itself, i.e., EGF,28,29 can provide a new class of inhibitors with potential synergistic effects when combined with existing drugs. However, the cross-activity of more than one EGF-family ligand against the EGFR (and its related receptors, particularly the HER4 receptor) complicates this endeavor. For instance, the heparin-binding EGF-like growth factor (HB-EGF), transforming growth factor α (TGF-α), and amphiregulin (AR) also play roles in activating these receptors and promote cancer progression.30
The multiligand nature of EGFR signaling is thus better tackled through the development of polyspecific binders capable of quenching more than one growth factor, ideally, with high affinity. Previous attempts to engineer a recombinant form of the entire extracellular segment of the EGF and HER3 receptors could achieve broad ligand-binding specificity.30,31 These binders were constructed as dimeric IgG1 Fc-fragment fusions with the four extracellular domains of the receptor. While achieving broad inhibition of EGFR and HER3 ligands, these constructs require recombinant expression in mammalian cells and possess a large molecular weight of approximately 190 kDa, which can hamper their bioavailability at the relevant tumor tissue.32
In this study, we aimed to create a miniature binder, using only one of the EGFR ligand-binding domains as a starting template and stabilizing it in its ligand-bound conformation by sequence redesign. Previous work has shown the human EGFR domain 3 (herein referred to as d3-WT) to be the ectodomain encoding most of the binding information to EGF.33 The d3-WT template sequence (Table S2) was restricted to a stretch of 168 amino acids, which contains disulfide bridges at the beginning and the end of the domain. The designs were instead based on a truncated domain boundary to include only 160 amino acids. All cysteine residues were excluded to improve downstream properties of the designs. The designable positions were set to comprise all residues with energy higher than a set threshold, which were identified using the “repack all” (ra) protocol. Running 100 instances of the tree swarm combinatorial sampler (cs_f2m2f) with randomized order of the designable positions yielded about 200 decoys with unique sequences. These were subject to accelerated MD (aMD) simulations and were ranked according to their conformational stability. The conformational stability scores as well as the RMSF plots derived from the latter simulations indicated a better stability of the designs compared with the d3-WT model, where the cysteines were reduced (Figure S5A). The two most mutually distant sequences in the top 10 designs were eventually selected for experimental evaluation, named dd3-1 and dd3-2 (designed domain 3; Figure 2A).
Figure 2.
The design and characterization of EGFR inhibitors
(A) The EGFR extracellular segment consists of four domains (d1, violet; d2, cyan; d3, teal; d4, yellow). In the absence of a ligand, it lies in a closed monomeric configuration. Upon ligand binding (here EGF, gray) the receptor adopts an open, dimeric configuration triggering intracellular signaling. As the third domain of the EGFR is reported to hold most of the binding affinity to the EGF ligand, it was used as a template to design soluble EGF binders. A close-up view of the EGF in complex with the wild-type d3 domain (d3-WT) is shown in gray and teal, respectively (PDB: 1IVO). Disulfide bridges in the d3-WT structure are shown as yellow sticks. Using the described energy function, the highest energy residues were identified. These residues were defined as mutable (shown in red) and designed using the combinatorial sampler. Two design models (dd3-1, purple; dd3-2, yellow) were finally chosen for experimental characterization.
(B) SPR sensograms show the dd3-2 design to bind EGF tighter than d3-WT. A similar pattern with improved affinity of the design was observed toward other EGFR ligands (Figures S6 and S7). Kd is represented as the mean ± standard deviation (SD). The χ2 value represents the difference between the experimental data and the fitted curve averaged over the whole sensogram. Experimental data, black; fit, red.
(C) Proliferation inhibition assays were done using the EGFR signaling-dependent A431 cells. The inhibition of cell proliferation was observed to be much stronger for dd3-2 (IC50 = 0.32 nM) than for d3-WT (IC50 = 476 nM). The positive and negative control values of cell proliferation with (400 pM) and without EGF treatment are indicated by red and blue lines, respectively. Shades and error bars represent SD across three replicates.
(D) Pharyngeal skeleton of zebrafish embryos was stained with Alcian blue. First-arch Meckel’s cartilage (mk) and second-arch derivative ceratobranchials (ch) are observable (arrows). Upon EGFR inhibition, embryos with partial absence of Meckel’s cartilage and ceratobranchials or without any cartilage formation are observed and categorized in the malformed class. Cartilage defection upon injection of PBS, cetuximab, dd3-2, and d3-WT is shown in percentages for each group. Two-tailed p values were analyzed by a 2 × 2 contingency table in GraphPad. n, the number of evaluated embryos. Scale bar, 250 μm.
The starting template (d3-WT) and two designs (dd3-1 and dd3-2) were expressed in E. coli. Double purification from the soluble fraction showed the three proteins to have similar yields of approximately 0.2 mg per liter of culture. The designs also showed thermostability similar to that of d3-WT, as evaluated by nanoscale differential scanning fluorimetry (nanoDSF) (Figure S5B). However, the designs exhibited much stronger EGF inhibition activity in proliferation assays using the EGF-dependent epidermoid carcinoma cell line A431. Particularly, dd3-2 was the most active design, with a proliferation inhibition IC50 more than 1,000-fold lower compared with d3-WT (IC50,dd3-2 = 0.32 nM vs. IC50,d3-WT = 476 nM) (Figure 2C) and only 3-fold higher than that of cetuximab, a therapeutic anti-EGFR antibody34 (Figure S5C). Next, to evaluate the difference in binding affinities toward EGF, we carried out surface plasmon resonance (SPR) titrations of our binders against immobilized EGF. The results showed that dd3-1 and dd3-2 bind EGF around 6-fold tighter than d3-WT, where dissociation constants (Kd) were 10, 9, and 56 nM for dd3-1, dd3-2, and d3-WT, respectively (Table 1; Figures 2B, S5D, S6, and S7). This enhanced binding can be the result of stabilizing the ligand-bound conformation. Compared with previous results that weaponized an Fc chimera of the entire EGFR extracellular segment (∼95 kDa per subunit),31 our dd3-2 design is far smaller (18 kDa), leading to a better protein efficiency (i.e., 35) of the latter (−2.6 kJ/kDa) vs. the former (−0.5 kJ/kDa). To further evaluate the ability of our designs, particularly dd3-2, to bind other related EGFR ligands, we performed SPR binding experiments against HB-EGF and TGF-α, which are also important therapeutic targets for treatment of EGFR-dependent cancers.30 The results showed dd3-2 to bind HB-EGF 10-fold tighter than d3-WT, where Kd values of 35 and 370 nM were observed for dd3-2 and d3-WT, respectively, and demonstrated dd3-2 to bind TGF-α 11-fold tighter than d3-WT, with Kd values of 232 and 2,550 nM, respectively (Table 1; Figures S6 and S7). This polyspecificity of d3 proteins could be an explanation of their high inhibitory activity against the A431 cell line, given the complex signaling interplay among autocrine and paracrine EGFR ligands.36 In addition, the stronger inhibition by the designs can be attributed to their improved stability in the monomeric form in solution, as observed during the proteins’ purification and analytical size exclusion (Figure S8).
Table 1.
SPR-derived binding parameters of d3-WT, dd3-1, and dd3-2 to different EGFR ligands
| (1/Ms) | (1/s) | (nM) | |
|---|---|---|---|
| d3-WT | |||
| EGF | |||
| HB-EGF | |||
| TGF-α | |||
| dd3-2 | |||
| EGF | |||
| HB-EGF | |||
| TGF-α | |||
| dd3-1 | |||
| EGF | |||
To investigate potential effects of the designed inhibitors in vivo, we injected equal volumes of PBS solution containing cetuximab (positive control), d3-WT, or dd3-2 into zebrafish embryos. As a negative control, pure PBS was injected. Inhibitors were administered starting at 4–6 h post-fertilization during 4 days. As a first step, the survival of the embryos was determined every day from 1 to 4 days post-fertilization (dpf). While almost no effect on survival was observed at any time point following injection of PBS, injections of cetuximab (5 mg/mL), d3-WT (0.3 mg/mL, 1.3 mg/mL), or dd3-2 (0.2 mg/mL, 1.0 mg/mL) were found to be lethal to different extents (Table S3). Next, we evaluated the morphological defects present in the surviving embryos at 4 dpf. Since it has been previously shown that EGFR inhibitors cause developmental defects in head cartilage,37 we analyzed head cartilage formation by Alcian blue staining (Figure 2D). In comparison to wild-type embryos with completely formed cartilaginous elements of the pharyngeal skeleton, the embryos with cartilaginous defects or even without any cartilage formation were classified as the malformed group. The results indicate cetuximab, d3-WT, and dd3-2 to affect skeletal development in a manner typical of EGFR signaling impairment. In line with the above-described biophysical and cell-based experiments, dd3-2 caused stronger effects in zebrafish embryos compared with d3-WT. Interestingly, both d3-WT and dd3-2 were more active than the anti-EGFR antibody cetuximab (Figure 2D).
Design and characterization of copper binders
To further test the performance of the Damietta potential, we sought to design metallic radionuclide-binding proteins. Metal-binding proteins serve essential functions, including catalysis, sensing, transport, and storage.38 Designed metalloproteins can be tailored to encode one or more of such functions and be useful for a range of biomedical applications.39,40 Particularly, metalloproteins capable of high-affinity metal binding, efficient storage, and transport can serve as electron microscopy contrast agents,41 probes for magnetic resonance imaging,42 or targeted radioactive tracers for radiotherapy and diagnostic imaging purposes.43 We specifically aimed to design proteins to bind the radioactive 64Cu2+ ions. Such genetically encodable radiotracers can be fused with targeting proteins for high-resolution PET imaging or radioligand therapy with the therapeutic radioisotope 67Cu, forming an ideal therapeutic/diagnostic pair.44
Given this intended function, we based our designs on helical bundles to create modules with robust and stable folding and thus minimal interference with any fused homing protein (e.g., tumor cell-targeting antibody fusions).45 We chose a cysteine-rich helical bundle protein (Csp1) as a starting template, which was shown to bind 13 Cu+ ions along its core.46 Although Csp1 has a low molecular weight (13 kDa) and possesses a simple up-down four-helix structure, it suffers several drawbacks. Specifically, Csp1 is unstable, is tetrameric, has low bacterial production yield, and has complex purification requirements.46 We therefore redesigned 22 amino acid positions, which were mostly surface exposed, to disrupt the oligomerization interface of the tetrameric Csp1 template and improve solubility and stability of the helical bundle (STAR Methods). The two most conformationally stable designs as assessed by MD simulations (named plr1 and plr2) were selected for experimental characterization (Table S2). The synthetic genes encoding plr1, plr2, and Csp1 were cloned without purification tags in a vector for expression in E. coli. The soluble expression levels were highest for plr1, followed by plr2, both being higher than Csp1. In contrast to the template, which has a net negative charge of −2, the designs possessed high net-positive charges (plr1, +14; plr2, +17). This particularly facilitated the purification of the supercharged designs using ion-exchange chromatography. We restricted our further characterization to the plr1 design given its high purification yield of >50 mg per liter of culture (>20-fold higher than Csp1). Analytical size exclusion showed plr1 to be monomeric, in contrast to the template Csp1, which was tetrameric and showed significant aggregation (Figure S8). Thermostability analysis indicated Csp1 and plr1 to have melting temperatures () of 79°C and 69°C, respectively (Figures 3A and 3B). However, Csp1 displayed a lower aggregation temperature (), which was 60°C and >110°C for Csp1 and plr1, respectively (Figure 3C). Similarly, irreversible thermal denaturation was observed for Csp1, in contrast to the reversible folding of plr1 (Figures 3A and 3B). The colloidal stability of the plr1 design is important for its clinical usefulness, given that aggregation tendency can greatly reduce the efficacy and raise the immunogenicity risk of biopharmaceuticals.47 The difficulty of handling Csp1 protein restricted our further functional analysis to the designed forms only.
Figure 3.
Design of stabilized 64Cu2+ binding proteins
(A) NanoDSF melting curves show Csp1 to unfold at 79°C (red curves), without a refolding transition upon cooling (blue curves).
(B) The plr1 design has a lower melting temperature of 69°C (red curves) but refolds upon cooling (blue curves). Melting temperatures (Tm) are represented as the mean ± SD.
(C) Csp1 scattering signal, however, shows an aggregation mid-point at 60°C, highlighting its colloidal instability, while plr1 does not show a change in scattering and does not precipitate in solution. Shading represents SD across three replicates.
(D) Titrations performed using the chromophoric change of zincon indicate plr1 to bind between 12 and 13 Cu2+ ions per molecule.
(E) Radiographic images of silica TLC plates with 64Cu2+-loaded plr1 (1) and the same sample stripped with DTPA for 3 h (2) show plr1 to bind 64Cu2+. TLC plates were developed with 0.1 M sodium citrate (pH 5). Proteins stay at the starting spot and DTPA migrates near the front line under these conditions. Similar results were observed for neg1 (Figure S9A).
(F) Size-exclusion chromatogram of plr1 at 280 nm (top), radioactive signal of runs with 64Cu2+-loaded plr1 (middle), and a sample stripped with DTPA (bottom). plr1 binds 64Cu2+ and elutes corresponding to the same size as non-loaded plr1.
By titrating copper and using a chromophoric probe, we observed a mid-point indicating approximately 13 Cu2+ binding sites on plr1 (Figure 3D). This high metal-binding capacity of almost 1 metal/kDa of protein is consistent with that originally reported by Csp1.46 This binding ratio could be beneficial for high imaging sensitivity and efficacy in radiotracer imaging and radioimmunotherapy applications, respectively. To test suitable labeling conditions with 64Cu2+, we incubated plr1 with the buffered radioisotope at a specific radioactivity of 2 GBq/mg. After incubation for 60 min at 35°C, the radioactivity was efficiently (>90%) sequestered into plr1, as judged by radio-thin-layer chromatography (radio-TLC) (Figure 3E), and eluted from size-exclusion chromatography with the same profile as unlabeled plr1 (Figure 3F). These results strongly support the capacity of the design to readily and stably chelate copper radioisotopes through a simple incubation procedure.
Exploring the determinants of structural stability of the copper-binding proteins can guide the generation of enhanced variants for clinical applications. Through a second design round, we aimed to create two classes of variants to evaluate their metal-binding stability. The first class involved repacking core residues to eliminate three or six core cysteine residues, cr3, and cr61 or cr62 designs, respectively (Figure 4A; Table S2). These variants have their cysteine-lined lumen plugged at the solvent-accessible end, which could restrict the outward diffusion of coordinated metal ions. The second class was negatively supercharged designs (neg1 and neg2), where the positively charged residues of plr1 were forcibly redesigned into neutral or negatively charged residues (Figure 4A; Table S2). These variants would provide a favorable electrostatic environment for the coordinated metal ions, especially given the +2 oxidation state of the target copper ions. While the five new designs were all well expressed, the three repacked core variants (cr3, cr61, and cr62) were majorly dimeric in solution and therefore were excluded from further analysis. Conversely, the neg1 variant could be purified in a monomeric state. In radio-TLC experiments, neg1 bound 64Cu2+, which was also observed for plr1 (Figure S9A). Competitive binding assays showed neg1 to bind Cu2+ 3-fold tighter compared with the plr1 design (Figures 4B and 4C), pointing to the possible stabilization of the metal:protein complex by negative charges. However, the thermal stability of the neg1 apoprotein decreased in comparison to plr1, where an earlier melting transition was observed for neg1, despite its reversible unfolding (Figure S9B). This affinity/stability trade-off was also evident when the copper-binding stability was assessed in untreated fetal bovine serum in vitro (Figure 4D). Whereas neg1 initially bound more copper ions than plr1, it released copper faster. Fitting to a first-order decay model yields dissociation rate constants in 4-fold diluted serum of and for neg1 and plr1, respectively. Notably, plr1 displayed a more complex copper dissociation behavior with a possible cooperative dissociation step, which might be due to accelerated chemical degradation upon copper ion release. These results guide further, more detailed, investigation of protein charge tuning and trans-chelation to maximize the binding stability of the radioisotope.
Figure 4.
Exploring the sequence-stability relationship of copper-binding proteins
(A) A phylogeny of the redesigned copper-binding proteins. The Csp1 template is shown in white in its tetrameric configuration (gray protomers) and the cysteine side chains at the core are depicted in a ball-and-stick representation (PDB: 5FJE). The polar side chains introduced in the first-generation design (plr1 model) are shown, yielding a monomeric, positively supercharged protein. Starting from the plr1 sequence, second-generation designs belong to three classes: core-repacked designs where either three (cr3) or six (cr61 and cr62) core cysteine residues were eliminated or negatively supercharged designs (neg1, neg2). Side chains colored yellow are cysteines, blue are positively charged, red are negatively charged, green are polar, and purple are non-polar.
(B and C) Competitive Cu2+ binding assays using zincon for plr1 and neg1 designs show sub-femtomolar dissociation constants, whereby neg1 binds Cu2+ more than 3-fold tighter than plr1. Error bars represent SD across two replicates.
(D) The Cu2+ release upon incubating protein:Cu2+ complexes in 4-fold diluted serum. Plr1 remains associated with Cu2+ despite the lower initial affinity, followed by cooperative dissociation. neg1, on the other hand, displays higher initial affinity, but faster Cu2+ dissociation. This highlights the higher proteolytic resistance of plr1 compared with neg1, which corresponds to their expected thermostabilities (Figures 3B and S9B). Shading represents SD across three replicates.
Radiolabeling of proteins (e.g., tumor-targeting antibodies) for PET imaging is not only important for routine tumor-imaging applications, but also for tracking the biodistribution of protein- and cell-based therapies in vivo. Currently, associating a metallic radionuclide (such as 64Cu2+) to a protein is mostly performed through chemical coupling of chelating agents (e.g., DOTA) to the protein of interest (e.g., NHS coupling).48 This undirected chemical coupling requires additional processing steps and introduces positional and stoichiometric heterogeneity of the labeled proteins, lowering their fidelity and usefulness for clinical applications. Conversely, our designed copper binders can be used as genetically encodable PET labeling tags that can be expressed on a target cell surface or as a single-chain fusion with the protein of interest. Given the high affinity of these proteins to Cu2+ ions, they can be readily loaded with copper radionuclides under mild conditions, greatly simplifying the radiolabeling procedure.
Limitations of the study
Currently, the described framework is restricted to fixed-backbone sequence design. However, we foresee that further developments of the framework will allow it to support motion. Also, while in its current state the framework applies isotropic charges, implementing more advanced electrostatic potential to describe the rotamer library can greatly enhance the scoring accuracy, at no added run-time cost.
STAR★Methods
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Cetuximab | MedChemExpress | Cat#HY-P9905 |
| Bacterial and virus strains | ||
| BL21(DE3) Competent Cells - Novagen | Sigma-Aldrich | Cat#69450 |
| Chemicals, peptides, and recombinant proteins | ||
| cOmplete, EDTA-free Protease Inhibitor Cocktail | Roche | Cat#5056489001 |
| DNase I | PanReac AppliChem | Cat#A3778 |
| DMEM, high glucose, pyruvate | Gibco | Cat#41966029 |
| Fetal Bovine Serum, certified, heat inactivated | Gibco | Cat#10082147 |
| DPBS, no calcium, no magnesium | Gibco | Cat#14190144 |
| Alcian Blue 8 GX | Sigma-Aldrich | Cat#A5268 |
| Zincon monosodium salt | Supelco | Cat#96440 |
| Recombinant Human EGF | PeproTech | Cat#AF-100-15 |
| Recombinant Human HB-EGF | R&D Systems | Cat#259-HE-050/CF |
| Recombinant Human TGF-alpha | R&D Systems | Cat#239-A-100 |
| Critical commercial assays | ||
| CellTiter-Blue® Cell Viability Assay | Promega | Cat#G8080 |
| Deposited data | ||
| Molecular dynamics simulation data for amino acids in explicit water performed under the CHARMM27 force field | Vitalini et al.12 | ftp://bdg.chemie.fu-berlin.de/Ac-X-NHMe/ |
| Thermodynamic stability data for protein G mutants | Nisthal et al.18 | ProtaBank ID: gwoS2haU3 |
| Protein folding stability data for protein mutants measured by cDNA display proteolysis | Tsuboyama et al.19 | https://doi.org/10.5281/zenodo.7844779 |
| Experimental models: Cell lines | ||
| A431 Cell Line human | ECACC | Cat#85090402; RRID:CVCL_0037 |
| Experimental models: Organisms/strains | ||
| Zebrafish: wild-type: AB | N/A | RRID:ZIRC_ZL1 |
| Software and algorithms | ||
| Damietta | This paper | https://doi.org/10.5281/zenodo.8152656 |
| NAMD | Phillips et al.49 | RRID:SCR_014894 |
| VMD | Humphrey et al.50 | RRID:SCR_001820 |
| Prism | GraphPad Software, Inc. | RRID:SCR_002798 |
| PyMOL | Schrödinger, Inc. | RRID:SCR_000305 |
| Biacore X100 Evaluation Software | Cytiva | RRID:SCR_015936 |
| Other | ||
| Gene synthesis | Synbio Technologies, Inc., BioCat GmbH |
N/A |
| Millex-HV Filter, 0.45 μm, PVDF, 33 mm | Millipore | Cat# SLHV033RS |
| Amicon Ultra-15 centrifugal filter unit, 10 kDa | Millipore | Cat# UFC901024 |
| HisTrap Excel column, 5 ml | Cytiva | Cat#GE17-3712-06 |
| Superdex 75 Increase 10/300 GL column | Cytiva | Cat#29148721 |
| HiTrap Capto Q column, 5 ml | Cytiva | Cat#11001303 |
| HiTrap Capto S column, 5 ml | Cytiva | Cat#17544123 |
| Prometheus Standard Capillaries | Nanotemper | Cat#PR-C002 |
| Sensor Chip CM5 | Cytiva | Cat#BR100012 |
| 96 Well TC-Treated Microplates | Corning | Cat#3596 |
| UV-Star plate, 384 well, F-bottom | Greiner | Cat#781801 |
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Mohammad ElGamacy (mohammad.elgamacy@med.uni-tuebingen.de).
Materials availability
This study did not generate new unique reagents.
Experimental model details
Bacterial protein expression system
BL21(DE3) competent E. coli cells were used for transformation and high-level protein expression using a T7 RNA polymerase-IPTG induction system.
Cell line
A431 cells (ECACC 85090402; RRID: CVCL_0037) were cultured in DMEM medium supplemented with 10% fetal bovine serum (FBS), at 37 °C, 5% CO2. Sub-confluent cultures were split 1:10 twice a week.
Zebrafish
Wild-type zebrafish line (AB, RRID:ZIRC_ZL1) was used for the experiments. Treatment of zebrafish embryos with tested inhibitors started at 4-6 hpf and continued until 4 dpf. The sex of the embryos was not taken into consideration. Zebrafish were maintained according to standard protocols and handled in accordance with European Union animal protection directive 2010/63/EU and approved by the local government (Tierschutzgesetz §11, Abs. 1, Nr. 1, husbandry permit 35/9185.46/Uni TÜ).
Method details
The rotamer library
Large-scale molecular dynamics simulations of capped amino acids (i.e. Ac-X-NHMe, where X represents the amino acid symbol) were used as conformational pools from which representative rotamers were sampled. These trajectories can be performed under a predefined force field in explicit water, yielding a pool of freely inter-changing conformations. In the current implementation, we have used a set of trajectories performed under the CHARMM27 force field in explicit solvent by Vitalini et al.12 The trajectory of every Ac-X-NHMe amino acid is 1-μs-long and was uniformly partitioned into 36 bins (within (±60, ±60) intervals). The propensity of each backbone conformational state (i.e. a bin) was used to represent its relative energy with respect to all of the other conformational bins as:
| (Equation 1) |
Where is the Boltzmann constant in kcal·mol−1K−1, is the temperature in Kelvins (here given as constant values of 0.001985875 and 298, respectively), is the number of observations in the th conformation bin, and is the total number of conformations. A similar explicit internal energy term is used to describe the side chain conformational preference, where all conformations within each bin are aligned to a single frame-of-references using the three atoms Cβ (or Gly Hα1), Cα, N. In this frame of reference Cα is positioned at the origin, Cα→Cβ (or Cα→Hα1) vector is aligned along the z-axis, and the Cα→N vector lies along the xy-plain. The aligned conformers undergo a 3D k-means clustering including all of their atomic coordinates, resulting in representative side chain conformers at the center of each cluster. Here the values of was set to 50, and 100 conformational clusters to build the 50- and 100-rotamer libraries, respectively. The representative conformer of each cluster was taken to be the one with the lowest RMSD to the average structure of the entire cluster, where the energy of every cluster is defined as:
| (Equation 2) |
Where is the number of observations in conformation bin of the th cluster out of clusters. In cases where the entire molecular dynamics simulation results in a conformational bin that is underpopulated, i.e. , the entire bin is not represented in the library.
The energy function
The energy function is composed of 5 terms representing the energy difference to a ground state of a solvated, capped amino acid, as follows: backbone internal energy (, Equation 1), side chain conformational energy (, Equation 2), Lennard-Jones interaction energy (), solvation energy (), and electrostatic interaction energy (). The total energy is a weighted sum of these terms, as:
| (Equation 3) |
The energy calculation scheme follows a two-body formulation whereby the interactions are only calculated between two sets of atoms belonging to the 1st-body and the 2nd-body. The 1st-body atoms are all the atoms within a bounding box of dimensions excluding the side chain atoms of the mutable residue, where the bounding box is centered at the Cα atom of the mutable residue. The 2nd-body atoms represent the side chains of the rotamer to be placed in the mutable residue position, as sampled from the rotamer library. This scheme applies to the interaction energy terms ( and ) as well as the solvation free energy term (). The non-bonded energy terms between the inbound sidechain atoms and the environment are corrected by subtracting a reference energy (, , and ; Equation 3). These reference values describe the interaction energies between the side chain of a rotamer and its backbone atoms in the respective conformation pooled from the MD. These reference energy values are precomputed with the rotamer library and are subtracted from the final interaction energy before it is weighted (Equation 3). To preserve the compatibility between different energy terms, the partial charges and Lennard-Jones parameters are also obtained from the CHARMM27 force field parameters,14 and the surface-area based solvation energy term relied on the CHARMM-based parameters for the EEF1-SB model.51 Moreover, the (Equation 1) and (Equation 2) terms, are derived from conformational distributions extracted from simulations that used the CHARMM27 force field.12
For evaluating electrostatic interaction energies, an approximation of the Generalised Born model was used. The interactions are calculated between the 1st-body atoms of the protein where the mutable residue side chain atoms are deleted, and protein atoms exist within a bounding simulation box, and the 2nd-body atoms that constitute the inbound side chain atoms looked up from the rotamer library. The electrostatics function was composed of three terms as follows:
| (Equation 4) |
Where and represent the partial charges of atoms and , separated by the distance . A distance-dependent dielectric function of was assumed. As the electrostatic interactions cutoff was set to 7.0 Å, the value of ranged as , where of a carbon-carbon interaction, for example, is approximately 4 Å. and represent the dielectric constant of protein core ( and water (), respectively.
Equation 4 was derived in this form to reduce the computing cost, whereby the first term is precomputed for 2nd body as , that is multiplied by the 1st body partial charges tensor (i.e. ), which is computed rapidly on the fly. The second and third terms represent the charge solvation corrections according to the Generalised Born model and are computed as a tensor scalar and an additive term, respectively. In the second term, the average interatomic distance in the simulation cube was used instead of the individual distances, and born radius was taken as the average born radius in the simulation cube; . The average interatomic distance within the simulation cube was set to , while the average born radius was approximated according to the fraction of filled volume of the simulation cube (i.e. ).
The Lennard-Jones function was implemented as a piece-wise function to avoid sensitivity to interatomic clashes, which can be relaxed upon MD-based minimization. The piece-wise function consists of three components; a standard LJ term in the attractive range of inter-atomic distances, a slow-growing repulsive term across a band of the atomic crust, and a flat maximum at a defined atomic core, as follows:
| (Equation 5) |
Where and are the minimum LJ energy value (in kcal·mol−1) and the LJ radius (in Å) of atom as obtained from the CHARMM27 parameters.14 The cutoffs and were set to 0.25 Å and , respectively. The use of and parameters of atom instead of the averaged parameters for atoms and was aimed at lowering the computing cost, since the entire LJ interaction fields are pre-calculated for inbound rotamer atoms (i.e. atoms). Both the LJ and electrostatic interaction fields are calculated for all values of at a resolution of 0.5 Å, where is the long-range cutoff that is set to 7.0 Å. Such fields are stored in the chemical library provided with the software, and are looked up during the design process depending on the mutable position bin.
These interactions are calculated within a cube where a single side of the cube () has the size of , containing voxels, where the voxel resolution is . To evaluate the impact of varying the voxel resolution on the calculation accuracy, we emulated the interaction of carbon atoms at varying resolutions. The results showed most of the energy error to stem from the repulsive part of the function (i.e. ), which was largely positive in value as is floored to the nearest discrete bin (Figure S10A). The results also showed the energy error to substantially decrease above a resolution of (Figure S10A). It worth highlighting that this approach is dissimilar to other parallel energy computations that are primarily thread-based, where a constant set of instructions can access a large number of shared variables (here; atom attributes) across the main (or a GPU) memory.52 Instead, the presented framework simplifies these calculations by encoding most of the energy function information into smooth fields associated with each discrete rotamer (Figure S10B). This leaves only populating the positions of the environment atoms as the quick on-the-fly step that is performed once for each designable position (Figure S10C). This renders the total energy much faster to compute through only 2 instructions and 2 variables; as (Figure S10D). Our method further stacks all rotamer fields belonging to one side chain into a single variable (i.e. 4D tensors encoding ) in order to reduce the number of memory calls. Unlike the standard way of computing the LJ function, this operation avoids any exponentiation or square roots, and offers substantially higher arithmetic intensity.
A generic solvation free energy term based on a surface area method was also put in place to account for the hydrophobic effect. This term was adapted from the EEF1-SB energy model parameters,51 and follows the form:
| (Equation 6) |
Where is the solvation energy per unit surface area (in kcal·mol−1Å−2) of atom , with solvent-exposed surface area when located at position vector . An approximation of the function is derived from the non-occluded vdW surface area of slightly inflated vdW radii. This radial inflation was performed here by an added 0.5 Å to the atomic vdW radii, while correcting for the for the larger atomic surface area to keep the solvation energy per atom type constant. The implementation relied on a voxelized representation of atomic crusts encoding the values in tensorial forms as well as core-masking tensors which are used to exclude the occluded atomic surfaces. This energy term is calculated separately for the 1st-body atoms (), the 2nd-body (), and the 1st-body and 2nd-body combined (). These values would represent the solvated free energies of the protein environment with the removed side chain atoms at the mutable residue, the inbound side chain atoms of a rotamer from the rotamer library, and the combined protein environment and rotamer side chain atoms after rotamer placement, respectively. Given these three values, the final solvation free energy term is calculated as follows:
| (Equation 7) |
Finally, while the different terms of the energy function are compatible as they were derived from the same force field, the softening of the repulsive component of the LJ term necessitates the downscaling of the electrostatic term ( = 0.25), in order to avoid highly clashing configurations with optimal electrostatic interactions. Additionally, the is recommended to be set to 0 for mutagenesis tasks, and to 1 for repacking tasks. This is as is not directly comparable across different amino acid types, given that different amino acid types have drastically different chemical exchange timeframes in solution, even when their backbone is fixed.53 Otherwise, the other weighting factors were all set to 1.0; = = = = 1.0. Additionally, to deploy some tiered scoring to avoid calculating all energy terms for highly clashing rotamers, a maximum LJ energy value was set to 5.0 kcal·mol-1.
Evaluation of scoring accuracy against stability benchmarks
Ability of the framework to evaluate stability of protein mutants was benchmarked using five independent previously reported experimental dataset of: 1) mutants of the β1 domain of streptococcal protein G (PDB: 1PGA);18 2) mutants of the N-terminal domain of phage 434 repressor (PDB: 1R69); 3) mutants of the SH3 domain in human obscurin (PDB: 1V1C), 4) mutants of the N-terminal domain of ribosomal protein L9 (PDB: 2HBB), and 5) mutants of the r11_829_TrROS protein designed by trRosetta hallucination (an AlphaFold model of the design was used).19 Generating mutants and estimating their energies was done using a single-point (sp) routine with the parameters (-max_lj 10.0, -w_pp 1.0, -w_k 1.0, -w_lj 1.0, -w_solv 1.0, -w_elec 0.25). ΔΔG was calculated as the difference in free energy between the mutant and the wild type reference. Predictive potential of a software was assessed using a Pearson correlation coefficient (R) for computed energy values against experimental data. For the dataset of Gβ1 mutants, performance of Damietta was analyzed in comparison with performance of Rosetta framework (ddg_monomer application, NoMin protocol) described before by Nisthal et al.18
Evaluation of rotamer and sequence recovery
For testing the ability of Damietta single-point mutagenesis routine to recover native side-chain conformations (rotamer recovery) the dataset described by Zhou et al.6 was used. The dataset consists of 9 pairs of structures, where each pair represents one X-ray and one NMR structure of the same monomeric protein (PDB: 1TVG, 1XPW; 3C4S, 2JZ2; 2O0Q, 2JQN; 2Q00, 2JPU; 3IDU, 2KL6; 3K63, 2KRT; 3FIF, 2JN0; 3H9X, 2KFP; 1TTZ, 1XPV). For NMR entries, first model was used (out of twenty models in each ensemble). For each structure in the dataset, the repack all (ra) routine was run for 10 successive rounds, where side chains of all amino acids were repacked from N- to C-terminus in each round. Rotamer recovery was evaluated in terms of predicted χ1 and χ2 side-chain torsion angles. An angle prediction was considered correct if the torsion error was in the range of ±40° from the native angle.54 χ1 accuracy was defined as a percentage of residues within a protein with correctly predicted χ1 angle. χ1&2 accuracy was defined as a percentage of residues within a protein with correctly predicted both χ1 and χ2 angles. Results were presented as a boxplot, where lower and upper hinges represent first and third quartile, respectively, and the whiskers extend from the box by 1.5 times the inter-quartile range (Figure S3). Core residues were identified as residues with solvent exposed surface area less than 5 Å2.
To evaluate the recovery of native amino acid identities (sequence recovery) a dataset of 9 above-mentioned X-ray structures was used (PDB: 1TVG, 3C4S, 2O0Q, 2Q00, 3IDU, 3K63, 3FIF, 3H9X, 1TTZ). Each amino acid position (except of N-terminal and C-terminal residues) was mutated individually using the single-point (sp) routine with the following parameters: 20 target amino acids, -max_lj 10.0, -w_pp 1.0, -w_k 1.0, -w_lj 1.0, -w_solv 1.0, -w_elec 0.25). Sequence recovery was calculated as a percentage of amino acid positions within a protein at which the lowest-energy residue selected by sp sampler is identical to the native amino acid.
Design of EGFR inhibitors
The design was performed using the EGF:d3 structure (PDB: 1IVO) as a template (residue range: 313-480 for d3-wt, and 313-472 for designed proteins). The input structure for Damietta applications has to be a single-chain structure that is CHARMM-typed, with all hydrogens included and no missing atoms. The input coordinates of the d3-wt domain structure was CHARMM-typed using the automatic PSF generation plugin (autopsf, version 1.8) as implemented in VMD (version 1.9.3).50 Using the repack-all application (damietta_ra), we had identified residues with energy higher than 20 kcal/mol, as well as all cysteine residues. These residues were subject to combinatorial design using the few-to-many-to-few sampler (damietta_cs_f2m2f). The combinatorial sampler mutates and moves the designable residues, and moves the repackable residues as specified in the spec file (Methods S1). The average energy per residue is calculated for both the mutable and repackable residues and are deterministically minimized. The order by which residues are optimized is randomizable and multiple traversals of the mutagenesis decision tree can be conducted. The context of the spec file contained the following parameters: the mutable residues (mut_res) and their target mutations; the repackable residues (rpk_res); a scrambled order of the mutable residues in every instance (scramble_order); the top mutations considered for combinatorial optimization (m_mutations); the number of parallel paths traversed down the decision tree (n_paths); the number of repeat iterations by which the tree is traversed (n_iters). The target mutations in the spec file were specified according to a sequence profile of d3 homologues obtained from closest 500 homologous sequences in the nr protein database. The spec file was run 100 instances, for 100 CPU hours/instance. The resulting decoys were further filtered according to their stability in accelerated molecular dynamics (aMD) simulations that follow a serial tempering routine previously described.55,56 These simulations were conducted using NAMD49 with a generalized Born implicit solvent model and a timestep of 1 fs. The tempering scheme starts by 500 steps of conjugate gradient minimization followed by an annealing cycle that was repeated for 160 rounds, with one configuration dump at the end of each cycle. The annealing cycle follows the sequence of: 100 minimization steps, 3000 timesteps (i.e. 3 ps) in a 370 K Langevin bath, 4000 timesteps (i.e. 4 ps) in a 250 K Langevin bath, and 100 minimization steps. The two most conformationally homogeneous designs (dd3-1 and dd3-2) were accordingly selected for experimental characterization. Conformational homogeneity was quantified as the average all-vs.-all RMSD averaged across all frames (i.e. 160 frames) output from the aMD simulations using VMD.50
Purification of EGFR inhibitors
Sequences of d3-wt and the designed proteins (dd3-1, dd3-2, Table S2) were ordered as synthetic genes in pET-28a(+) expression vector (Synbio Technologies, Inc.). Plasmids were transformed into chemically competent E. coli BL21(DE3) using the heat shock method. Transformed cells were grown in LB medium supplemented with 40 μg/ml kanamycin at 37 °C. At OD600 of around 0.6-0.8, cells were induced with 1mM IPTG and incubated overnight at 25°C for protein expression. Cells were harvested by centrifugation at 5000 g at 4°C for 15 min and lysed in 30 ml of lysis buffer (1M guanidinium chloride, 100 mM NaCl, 50mM Tris-HCl pH 8.0) supplemented with a tablet of the cOmplete, EDTA-free Protease Inhibitor Cocktail (Roche) and 3 mg of lyophilized DNase I (PanReac AppliChem) using a Branson Sonifier 250 (Fisher Scientific). The lysate was cleared by centrifugation at 28000 g at 4°C for 50 min and the supernatant was filtered through a 0.45 μm filter (Millipore). The sample was applied to a 5 ml HisTrap Excel column (Cytiva). The running buffer was 200 mM NaCl, 30 mM Tris-HCl pH 8.0. After sequential washing the column with 20 ml of the running buffer and 20 ml of the running buffer supplemented with 50 mM imidazole, fractions were collected by linear gradient elution using 150 mM NaCl, 30 mM Tris-HCl pH 8.0, 500 mM imidazole buffer. The eluted fractions containing the protein of interest were pooled, concentrated using 10 kDa MWCO centrifugal filters (Millipore), and further purified on a Superdex 75 Increase 10/300 GL gel filtration column (Cytiva) using PBS. Gel filtration fractions containing pure protein in the desired oligomeric state were pooled, concentrated, and stored at -20 °C for subsequent analyses. Both IMAC and gel filtration steps were performed on an Äkta Pure chromatography system (Cytiva).
Thermostability analysis of EGFR inhibitors
NanoDSF measurements using Prometheus NT.48 (Nanotemper) were performed to evaluate thermostability of d3-wt and the designs (dd3-1, dd3-2). Capillaries (Nanotemper) were filled with 0.5 mg/ml protein samples in three replicates. Melting scan was performed across the temperature range from 20 °C to 90 °C with a temperature ramp of 1 °C/min.
Surface plasmon resonance binding assays
Multi-cycle kinetics experiments were performed on a Biacore X100 system (Cytiva). For measuring binding to EGF, EGF (Peprotech) was diluted to 100 μg/mL in 10 mM acetate buffer pH 4.0 and immobilized on the surface of a CM5 sensor chip (Cytiva) using standard amine coupling chemistry. Five different concentrations of the sample solution (nanomolar range) were injected over the functionalized sensor chip surface for 120 s, followed by a 180 s dissociation with the running buffer. At the end of each run, the sensor surface was regenerated with a 30 s injection of 50 mM HCl at a flow rate of 10 μL/min. For measuring binding to HB-EGF, HB-EGF (R&D Systems) was diluted to 20 μg/mL in 10 mM acetate buffer pH 5.0 and immobilized on the surface of a CM5 sensor chip using standard amine coupling chemistry. Five different concentrations of the sample solution (nanomolar range) were injected over the functionalized sensor chip surface for 180 s, followed by a 180 s dissociation with the running buffer. At the end of each run, the sensor surface was regenerated with a 30 s injection of 50 mM NaOH at a flow rate of 10 μL/min. For measuring binding to TGF-α, TGF-α (R&D Systems) was diluted to 100 μg/mL in 10 mM acetate buffer pH 4.5 and immobilized on the surface of a CM5 sensor chip using standard amine coupling chemistry. Five different concentrations of the sample solution (nanomolar – low micromolar range) were injected over the functionalized sensor chip surface for 60 s, followed by a 60 s dissociation with the running buffer. At the end of each run, the sensor surface was regenerated with a 60 s injection of 10 mM glycine-HCl pH 1.5 at a flow rate of 10 μL/min. In all experiments, reference surfaces were treated in the same manner (surface activation and deactivation with amine coupling reagents), except that no ligand was added. Test proteins were diluted in the running buffer (PBS supplemented with 0.05% v/v Tween-20). Analyses were conducted at 25°C at a flow rate of 10 μL/min. The reference responses and zero-concentration sensograms were subtracted from each dataset (double-referencing). Association rate (ka), dissociation rate (kd), and equilibrium dissociation (Kd) constants were obtained using the Biacore X100 Evaluation Software. Fitting was performed with global parameter settings where single parameter value applies to the whole titration series. To estimate the reliability of the fit, the fitting procedure was repeated using 4 out of 5 analyte concentrations (excluding one concentration at a time), yielding average values and standard deviations for a single titration series. For binding assays of d3-wt and dd3-2 vs EGF, HB-EGF, and TGF-alpha, two independent titration series were performed.
A431 cell proliferation assay
A431 cells were cultured in DMEM medium (Gibco) supplemented with 10 % FBS (Gibco). Cells were pelleted by centrifugation at 300 g for 5 min, washed once with DPBS (Gibco) and once with non-supplemented DMEM medium. After the last washing step, cells were resuspended in DMEM medium supplemented with 1 % FBS and 400 pM EGF (Peprotech). Cell suspension was seeded in a 96-well plate (Corning), 100 μl/well, at a density of 8000 cells/well. Different concentrations of d3-wt (0.032 nM – 500 nM), dd3-2 (0.0064 nM – 100 nM), or cetuximab (0.0064 nM – 100 nM) were added to the wells in triplicates. PBS was added to the wells serving as an untreated control. Several wells contained cells in DMEM medium supplemented with 1 % FBS, but without EGF, as an unstimulated control. After incubation for 72 h at 37°C, 5% CO2, 20 μL of CellTiter-Blue® Reagent (Promega) were added to the wells and the plate was incubated for additional 2 h under the same conditions to allow cells to convert resazurin to resorufin. Cell viability was monitored by measuring fluorescence (560/590 nm) using a Synergy HTX Microplate Reader (BioTek). The data were presented as percentage of unstimulated (i.e., without EGF) control fluorescence values.
Effect of EGFR inhibitors on zebrafish embryos
Eggs were collected and placed at 28 °C in E3 medium (5 mM NaCl, 0.17 mM KCl, 0.4 mM CaCl2, and 0.16 mM MgSO4). The age of the embryos and larvae is indicated as hours postfertilization (hpf) or days post fertilization (dpf). All experiments described in the present study were conducted on embryos younger than 5 dpf. To test the effect of the designed inhibitors on zebrafish, we injected around 4 nl of Cetuximab (5.0 mg/ml), dd3-2 (0.2 and 1.0 mg/ml), d3-wt (0.3 and 1.3 mg/ml) into the yolk of embryos at 4-6 hpf and then continued the treatment by adding the inhibitors into medium until 4 dpf. Embryos were distributed in pools of 10-15 into 24-well plates in E3 medium. Survival ratio was assessed every day from 1 dpf to 4 dpf and morphological or developmental defects were analyzed on fixed and stained embryos with Alcian blue at 4 dpf using a Nikon SMZ18 stereomicroscope with a DS-Fi3 camera (5,9 MP). GraphPad Prism software (version 7) was used for graphing and statistical analysis. Cartilage was stained with Alcian Blue 8 GX (Sigma). Zebrafish larvae were fixed in PFA 4 % for 2 h at room temperature, rinsed with PBST and stained overnight with 10 mM MgCl2/80 % ethanol/0.04 % Alcian Blue solution. Embryos were rinsed with 80 % ethanol/10 mM MgCl2 and washed stepwise with 70 %, 50 %, 30 % ethanol and PBST. Pigments were bleached in H2O2 3 %/formamide 5 %/20X SSC 2,5 % up to 30 minutes. Embryos were stored in 80 % glycerol for imaging.
Design of copper-binding proteins
Combinatorial design simulations were run to redesign the template structure of apo-protein form (PDB: 5FJD), given the limitations of the described method and the used classical mechanics force field to adequately describe coordinated metal ions. Most surface positions were set as designable to break the oligomerization interfaces and stabilize the helical structures. The spec file (Methods S1) was run 100 instances, for 125 CPU hours/instance. The final sequences were chosen according to the same aMD filtering described above. Through a second round of computational design, we sought to create constructs with a more sealed core. This was done by mutating 3 or 6 cysteine residues (and their surrounding positions) into well-packed hydrophobic residues at the end of the cysteine-lined lumen of the plr1 model. Three such design candidates were also synthesized and tested: cr3, cr61, and cr62. Additionally, the surface positions of plr1 were also redesigned to bias them towards negative supercharged variants, where two candidates were selected and tested: neg1 and neg2.
Purification of copper-binding proteins
The synthetic genes for all tested designs, and the design template (Table S2) were cloned without purification tags in a pET28b(+) vector (BioCat GmbH). The proteins were transformed and expressed in E. coli BL21 (DE3). Expression was induced in 2-litre LB medium supplemented with 40 μg/ml kanamycin at an optical density (OD600) of 0.8, and was done overnight at 25 °C. Cells were harvested by centrifugation at 5000 g at 4 °C for 15 min and lysed in 30 ml of lysis buffer (for positively charged variants: 100 mM NaCl, 50 mM Tris-HCl pH 8.0; for negatively charged variants: 2 mM EDTA, 20 mM Tris-HCl pH 8.0) supplemented with a tablet of the cOmplete, EDTA-free Protease Inhibitor Cocktail (Roche) and 3 mg of lyophilized DNase I (PanReac AppliChem) using a Branson Sonifier 250 (Fisher Scientific). The lysate was cleared by centrifugation at 28000 g at 4°C for 50 min and the supernatant was filtered through a 0.45 μm filter (Millipore). The sample was diluted 5-fold and applied to a 5 ml HiTrap Capto Q or S columns depending on their isoelectric point (Cytiva). Positively charged proteins were eluted in 20 mM HEPES, 1 mM DTT buffer pH 7.4, using a gradient of 0 to 1.5 M KCl. Negatively charged variants where eluted in 20 mM HEPES, 1 mM DTT buffer pH 8.0, using a gradient of 0 to 1.5 M NaCl. The relevant fractions were identified by SDS-PAGE analysis, and further purified on a Superdex 75 Increase 10/300 GL gel filtration column (Cytiva) using 20 mM HEPES, 150 mM NaCl buffer pH 7.4. Gel filtration fractions containing pure protein in the desired oligomeric state were pooled, concentrated, and stored at -20°C for subsequent analyses.
Thermostability analysis of copper binders
NanoDSF measurements using Prometheus NT.48 (Nanotemper) were performed to evaluate thermostability of selected designs, as well as thermostability of Csp1. Capillaries (Nanotemper) were filled with 0.1-1 mg/ml protein samples in 3 replicates. Melting scan was performed across the temperature range from 20 °C to 110 °C with a temperature ramp of 1 °C/min. In addition to measuring the intrinsic fluorescence intensity ratio (350/330 nm), light intensity loss due to scattering (backreflection) was measured to detect protein aggregation.
Cu2+-binding affinity, capacity, and stability
To evaluate how many Cu2+ ions can be bound within the core of plr1, absorption spectra (from 400 nM to 700 nM) were recorded for samples containing 20 μM of Zincon (Supelco), 20 μM of CuSO4 and varying concentrations of plr1 (to provide ratio Cu2+/plr1 from 1 to 19). In case, when plr1 is saturated with Cu2+ ions, complex between Cu2+ and Zincon forms and characteristic Cu2+ZI peak at 599 nM can be observed on the spectrum.
The binding affinities of the designs to Cu2+ were determined using a modified Zincon assay described by Kocyla et al.57 Zincon competition tests with plr1 and neg1 were performed in 20 mM HEPES buffer, pH 7.4, containing 150 mM NaCl. 50 μM of Zincon was mixed with 20 μM of CuSO4 and different concentrations of a protein solution (0 – 160 μM for plr1, 0 – 80 μM for neg1). Samples were incubated for 8 h at 25 °C. The exact concentrations of Cu2+ZI complex present in each sample were calculated based on the absorbances at 599 nM using the molar absorption coefficient of Cu2+ZI at pH 7.4, 26100 M-1cm-1. Absorbances of the samples were measured on a Synergy HTX Microplate Reader (BioTek) in a 384-well plate (Greiner). The dissociation constant of the designed proteins () was calculated as follows:
| (Equation 8) |
where is a dissociation constant of Cu2+ZI at pH 7.4, 4.68×10-17 M, and is a constant describing the reaction of Cu2+ ion transfer from Cu2+ZI to the tested copper-binding design. was determined by fitting the experimental data to the following equation:
| (Equation 9) |
To measure off rates for Cu2+ dissociating from plr1 or neg1, we performed the dissociation experiments in 20 mM HEPES buffer, pH 7.4, containing 150 mM NaCl and 25 % v/v of untreated fetal bovine serum. Samples containing 50 μM of Zincon, 50 μM of CuSO4 and 150 μM of either plr1 or neg1 were incubated in 384-well plate (Greiner) for 48 h at 37 °C. Every hour the absorbances at 599 nM were recorded. Samples containing no Cu2+ and no protein were used as a control, and average absorbance of these samples was subtracted from all tested values. Results are presented as a decrease in amount of Cu2+ bound () to the protein over time (t). Absorbance values from the wells containing Zincon and Cu2+, but no protein were used for normalization and were referred to as 100 % of unbound Cu2+. Dissociation rates () were determined by fitting the experimental data to the equation:
| (Equation 10) |
where is a percent of Cu2+ bound to the protein at time zero.
Loading of copper binders with 64Cu
64Ni (98 % enrichment) was electroplated on a Pt/Ir plate (90/10) and irradiated with 12.5 MeV protons on the Tübingen PETtrace cyclotron (GE Healthcare) to produce 64Cu via the 64Ni(p,n)64Cu route. The target was dissolved using concentrated HCl and 64Cu2+ was purified using ion chromatography as described before.58 The obtained radioisotope solution in 0.1 M HCl was buffered with 1.5 volumes of 0.5 M ammonium acetate pH 4.1 before addition of the protein (2 μg per MBq). After 30 min of incubation at 35 °C incorporation of the radioactivity was analyzed by thin layer chromatography (stationary phase: Polygram SIL G UV254, Macherey-Nagel; mobile phase: 0.1 M sodium citrate pH 5) with autoradiographic detection using a phosphor imager (Cyclone Plus Storage Phosphor System, Perkin Elmer). Size exclusion chromatography with radioactivity detector (1260 Infinity II, Agilent; Superdex 75 Increase 10/300, Cytiva) was used to analyze the elution profile of the protein and bound radioactivity. For stripping of the bound radioactivity DTPA (final concentration 0.14 mg/ml) was added to the labeled protein and re-analyzed after incubation at room temperature.
Quantification and statistical analysis
To evaluate scoring accuracy of the Damietta potential against different stability benchmarks the Pearson correlation coefficients and the correlation p-values were calculated using the scipy.stats sub-package in the SciPy (version 1.8.0). For SPR measurements, A431 cell proliferation, as well as copper-binding experiments, mean values and standard deviations were calculated and described in the Figure legends. Statistical analysis for zebrafish experiment was performed using the GraphPad Prism software (version 7).
Acknowledgments
This project has received funding from the IMPRS (K.M.), the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement 863952 [ACE-OF-SPACE]) (P.M.), the M. Schickedanz Kinderkrebsstiftung (M.E., J.S., and N.A.), the “German Universities Excellence Initiative” of Tübingen University (J.S.), and the Deutsche Forschungsgemeinschaft (DFG; no. 500215849) (M.E., B.H.A., and J.S.). Some computations described in this work were performed on the HPC system Raven at the Max Planck Computing and Data Facility. We acknowledge support from the Open Access Publishing Fund of the University of Tübingen. We also thank Dominik Seyfried and Johannes Kinzler for excellent technical assistance and Prof. Gerald Reischl for the supply of 64Cu.
Author contributions
M.E. conceptualized the computational design framework and developed the software. C.B., K.M., M.E., and T.M.H.D. improved the energy function and performed and analyzed MD simulations. J.S. and P.M. selected design targets. K.M. and M.E. performed the computational design. A.M., B.H.A., K.M., M.E., N.B.-B., and T.U. experimentally characterized the designed proteins. N.A. designed, performed, and analyzed the results of zebrafish experiments. A.M., J.S., K.M., M.E., and N.A. wrote, reviewed, and edited the manuscript. A.N.L., J.S., M.E., P.M., and T.M.H.D. contributed to supervision and project administration. A.N.L., A.M., J.S., M.E., and P.M. provided resources. A.N.L., J.S., M.E., and P.M. acquired funding.
Declaration of interests
The designed EGFR inhibitors and the copper binders described in this study are part of patent applications nos. EP22190708 (inventors: K.M., M.E., and J.S.) and EP22206059 (inventors: M.E. and J.S.), respectively. These applications were filed by Eberhard Karls Universität Tübingen and Max-Planck-Gesellschaft zur Förderung der Wissenschaften e.V. M.E. is a co-founder of Heliopolis Biotechnology LLC.
Published: August 15, 2023
Footnotes
Supplemental information can be found online at https://doi.org/10.1016/j.crmeth.2023.100560.
Supplemental information
Data and code availability
-
•
This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table.
-
•
Damietta software is available at https://bio.mpg.de/damietta. An archival DOI is listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
References
- 1.Fleishman S.J., Baker D. Role of the Biomolecular Energy Gap in Protein Design, Structure, and Evolution. Cell. 2012;149:262–273. doi: 10.1016/j.cell.2012.03.016. [DOI] [PubMed] [Google Scholar]
- 2.Kuhlman B., Bradley P. Advances in protein structure prediction and design. Nat. Rev. Mol. Cell Biol. 2019;20:681–697. doi: 10.1038/s41580-019-0163-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Alford R.F., Leaver-Fay A., Jeliazkov J.R., O’Meara M.J., DiMaio F.P., Park H., Shapovalov M.V., Renfrew P.D., Mulligan V.K., Kappel K., et al. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theor. Comput. 2017;13:3031–3048. doi: 10.1021/acs.jctc.7b00125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Rohl C.A., Strauss C.E.M., Misura K.M.S., Baker D. Methods in Enzymology. Academic Press; 2004. Protein Structure Prediction Using Rosetta; pp. 66–93. [DOI] [PubMed] [Google Scholar]
- 5.Sheffler W., Baker D. RosettaHoles: rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci. 2009;18:229–239. doi: 10.1002/pro.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhou J., Panaitiu A.E., Grigoryan G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc. Natl. Acad. Sci. USA. 2020;117:1059–1068. doi: 10.1073/pnas.1908723117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Norn C., Wicky B.I.M., Juergens D., Liu S., Kim D., Tischer D., Koepnick B., Anishchenko I., Foldit Players. Baker D., Ovchinnikov S. Protein sequence design by conformational landscape optimization. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2017228118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Vokes E.E., Chu E. Anti-EGFR therapies: clinical experience in colorectal, lung, and head and neck cancers. Oncology. 2006;20:15–25. [PubMed] [Google Scholar]
- 9.Dunbrack R.L., Jr. Rotamer libraries in the 21st century. Curr. Opin. Struct. Biol. 2002;12:431–440. doi: 10.1016/s0959-440x(02)00344-5. [DOI] [PubMed] [Google Scholar]
- 10.Towse C.-L., Rysavy S.J., Vulovic I.M., Daggett V. New Dynamic Rotamer Libraries: Data-Driven Analysis of Side-Chain Conformational Propensities. Structure. 2016;24:187–199. doi: 10.1016/j.str.2015.10.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Childers M.C., Towse C.L., Daggett V. The effect of chirality and steric hindrance on intrinsic backbone conformational propensities: tools for protein design. Protein Eng. Des. Sel. 2016;29:271–280. doi: 10.1093/protein/gzw023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Vitalini F., Noé F., Keller B.G. Molecular dynamics simulations data of the twenty encoded amino acids in different force fields. Data Brief. 2016;7:582–590. doi: 10.1016/j.dib.2016.02.086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Conway P., Tyka M.D., DiMaio F., Konerding D.E., Baker D. Relaxation of backbone bond geometry improves protein energy landscape modeling. Protein Sci. 2014;23:47–55. doi: 10.1002/pro.2389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.MacKerell A.D., Jr., Banavali N., Foloppe N. Development and current status of the CHARMM force field for nucleic acids. Biopolymers. 2000;56:257–265. doi: 10.1002/1097-0282(2000)56:4<257::Aid-bip10029>3.0.Co;2-w. [DOI] [PubMed] [Google Scholar]
- 15.Childers M.C., Towse C.-L., Daggett V. Molecular dynamics-derived rotamer libraries for d-amino acids within homochiral and heterochiral polypeptides. Protein Eng. Des. Sel. 2018;31:191–204. doi: 10.1093/protein/gzy016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Shapovalov M.V., Dunbrack R.L., Jr. A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure. 2011;19:844–858. doi: 10.1016/j.str.2011.03.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kim D.E., Blum B., Bradley P., Baker D. Sampling Bottlenecks in De novo Protein Structure Prediction. J. Mol. Biol. 2009;393:249–260. doi: 10.1016/j.jmb.2009.07.063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Nisthal A., Wang C.Y., Ary M.L., Mayo S.L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. USA. 2019;116:16367–16377. doi: 10.1073/pnas.1903888116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tsuboyama K., Dauparas J., Chen J., Laine E., Mohseni Behbahani Y., Weinstein J.J., Mangan N.M., Ovchinnikov S., and Rocklin G.J Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023 doi: 10.1038/s41586-023-06328-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Romero P.A., Arnold F.H. Exploring protein fitness landscapes by directed evolution. Nat. Rev. Mol. Cell Biol. 2009;10:866–876. doi: 10.1038/nrm2805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Leaver-Fay A., Tyka M., Lewis S.M., Lange O.F., Thompson J., Jacak R., Kaufman K.W., Renfrew P.D., Smith C.A., Sheffler W., et al. In: Computer Methods, Part C. Johnson M.L., Brand L., editors. Academic Press; 2011. Chapter nineteen - Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules; pp. 545–574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.ElGamacy M., Riss M., Zhu H., Truffault V., Coles M. Mapping Local Conformational Landscapes of Proteins in Solution. Structure. 2019;27:853–865.e5. doi: 10.1016/j.str.2019.03.005. [DOI] [PubMed] [Google Scholar]
- 23.Weiel M., Götz M., Klein A., Coquelin D., Floca R., Schug A. Dynamic particle swarm optimization of biomolecular simulation parameters with flexible objective functions. Nat. Mach. Intell. 2021;3:727–734. doi: 10.1038/s42256-021-00366-3. [DOI] [Google Scholar]
- 24.Woodburn J.R. The epidermal growth factor receptor and its inhibition in cancer therapy. Pharmacol. Ther. 1999;82:241–250. doi: 10.1016/s0163-7258(98)00045-x. [DOI] [PubMed] [Google Scholar]
- 25.Ogiso H., Ishitani R., Nureki O., Fukai S., Yamanaka M., Kim J.-H., Saito K., Sakamoto A., Inoue M., Shirouzu M., Yokoyama S. Crystal Structure of the Complex of Human Epidermal Growth Factor and Receptor Extracellular Domains. Cell. 2002;110:775–787. doi: 10.1016/S0092-8674(02)00963-7. [DOI] [PubMed] [Google Scholar]
- 26.Chong C.R., Jänne P.A. The quest to overcome resistance to EGFR-targeted therapies in cancer. Nat. Med. 2013;19:1389–1400. doi: 10.1038/nm.3388. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chan D.L.H., Segelov E., Wong R.S., Smith A., Herbertson R.A., Li B.T., Tebbutt N., Price T., Pavlakis N. Epidermal growth factor receptor (EGFR) inhibitors for metastatic colorectal cancer. Cochrane Database Syst. Rev. 2017;6:CD007047. doi: 10.1002/14651858.CD007047.pub2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Schrank Z., Chhabra G., Lin L., Iderzorig T., Osude C., Khan N., Kuckovic A., Singh S., Miller R.J., Puri N. Current Molecular-Targeted Therapies in NSCLC and Their Mechanism of Resistance. Cancers. 2018;10:224. doi: 10.3390/cancers10070224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Guardiola S., Varese M., Sánchez-Navarro M., Giralt E. A Third Shot at EGFR: New Opportunities in Cancer Therapy. Trends Pharmacol. Sci. 2019;40:941–955. doi: 10.1016/j.tips.2019.10.004. [DOI] [PubMed] [Google Scholar]
- 30.Yotsumoto F., Sanui A., Fukami T., Shirota K., Horiuchi S., Tsujioka H., Yoshizato T., Kuroki M., Miyamoto S. Efficacy of ligand-based targeting for the EGF system in cancer. Anticancer Res. 2009;29:4879–4885. [PubMed] [Google Scholar]
- 31.Sarup J., Jin P., Turin L., Bai X., Beryt M., Brdlik C., Higaki J.N., Jorgensen B., Lau F.W., Lindley P., et al. Human epidermal growth factor receptor (HER-1:HER-3) Fc-mediated heterodimer has broad antiproliferative activity in vitro and in human tumor xenografts. Mol. Cancer Therapeut. 2008;7:3223–3236. doi: 10.1158/1535-7163.MCT-07-2151. [DOI] [PubMed] [Google Scholar]
- 32.Li Z., Li Y., Chang H.-P., Chang H.-Y., Guo L., Shah D.K. Effect of Size on Solid Tumor Disposition of Protein Therapeutics. Drug Metab. Dispos. 2019;47:1136–1145. doi: 10.1124/dmd.119.087809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lax I., Fischer R., Ng C., Segre J., Ullrich A., Givol D., Schlessinger J. Noncontiguous regions in the extracellular domain of EGF receptor define ligand-binding specificity. Cell Regul. 1991;2:337–345. doi: 10.1091/mbc.2.5.337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Baselga J. The EGFR as a target for anticancer therapy—focus on cetuximab. Eur. J. Cancer. 2001;37:16–22. doi: 10.1016/S0959-8049(01)00233-7. [DOI] [PubMed] [Google Scholar]
- 35.ElGamacy M. Advances in Protein Chemistry and Structural Biology. Academic Press; 2022. Accelerating therapeutic protein design. [DOI] [PubMed] [Google Scholar]
- 36.Hoesl C., Röhrl J.M., Schneider M.R., Dahlhoff M. The receptor tyrosine kinase ERBB4 is expressed in skin keratinocytes and influences epidermal proliferation. Biochim. Biophys. Acta Gen. Subj. 2018;1862:958–966. doi: 10.1016/j.bbagen.2018.01.017. [DOI] [PubMed] [Google Scholar]
- 37.Pruvot B., Curé Y., Djiotsa J., Voncken A., Muller M. Developmental defects in zebrafish for classification of EGF pathway inhibitors. Toxicol. Appl. Pharmacol. 2014;274:339–349. doi: 10.1016/j.taap.2013.11.006. [DOI] [PubMed] [Google Scholar]
- 38.Malmstrom B.G., Neilands J.B. Metalloproteins. Annu. Rev. Biochem. 1964;33:331–354. doi: 10.1146/annurev.bi.33.070164.001555. [DOI] [PubMed] [Google Scholar]
- 39.Lu Y., Yeung N., Sieracki N., Marshall N.M. Design of functional metalloproteins. Nature. 2009;460:855–862. doi: 10.1038/nature08304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chalkley M.J., Mann S.I., DeGrado W.F. De novo metalloprotein design. Nat. Rev. Chem. 2022;6:31–50. doi: 10.1038/s41570-021-00339-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Ellisman M.H., Deerinck T.J., Shu X., Sosinsky G.E. Picking faces out of a crowd: genetic labels for identification of proteins in correlated light and electron microscopy imaging. Methods Cell Biol. 2012;111:139–155. doi: 10.1016/b978-0-12-416026-2.00008-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Matsumoto Y., Jasanoff A. Metalloprotein-based MRI probes. FEBS Lett. 2013;587:1021–1029. doi: 10.1016/j.febslet.2013.01.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Sawyer J.R., Tucker P.W., Blattner F.R. Metal-binding chimeric antibodies expressed in Escherichia coli. Proc. Natl. Acad. Sci. USA. 1992;89:9754–9758. doi: 10.1073/pnas.89.20.9754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Keinänen O., Fung K., Brennan J.M., Zia N., Harris M., van Dam E., Biggin C., Hedt A., Stoner J., Donnelly P.S., et al. Harnessing 64Cu/67Cu for a theranostic approach to pretargeted radioimmunotherapy. Proc. Natl. Acad. Sci. USA. 2020;117:28316–28327. doi: 10.1073/pnas.2009960117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.ElGamacy M., Hernandez Alvarez B. Expanding the versatility of natural and de novo designed coiled coils and helical bundles. Curr. Opin. Struct. Biol. 2021;68:224–234. doi: 10.1016/j.sbi.2021.03.011. [DOI] [PubMed] [Google Scholar]
- 46.Vita N., Platsaki S., Baslé A., Allen S.J., Paterson N.G., Crombie A.T., Murrell J.C., Waldron K.J., Dennison C. A four-helix bundle stores copper for methane oxidation. Nature. 2015;525:140–143. doi: 10.1038/nature14854. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Pham N.B., Meng W.S. Protein aggregation and immunogenicity of biotherapeutics. Int. J. Pharm. 2020;585 doi: 10.1016/j.ijpharm.2020.119523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Rolle A.-M., Hasenberg M., Thornton C.R., Solouk-Saran D., Männ L., Weski J., Maurer A., Fischer E., Spycher P.R., Schibli R., et al. ImmunoPET/MR imaging allows specific detection of Aspergillus fumigatus lung infection in vivo. Proc. Natl. Acad. Sci. USA. 2016;113:E1026–E1033. doi: 10.1073/pnas.1518836113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Phillips J.C., Hardy D.J., Maia J.D.C., Stone J.E., Ribeiro J.V., Bernardi R.C., Buch R., Fiorin G., Hénin J., Jiang W., et al. Scalable molecular dynamics on CPU and GPU architectures with NAMD. J. Chem. Phys. 2020;153 doi: 10.1063/5.0014475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Humphrey W., Dalke A., Schulten K. VMD: visual molecular dynamics. J. Mol. Graph. 1996;14:33–38.27.28. doi: 10.1016/0263-7855(96)00018-5. [DOI] [PubMed] [Google Scholar]
- 51.Bottaro S., Lindorff-Larsen K., Best R.B. Variational Optimization of an All-Atom Implicit Solvent Force Field to Match Explicit Solvent Simulation Data. J. Chem. Theor. Comput. 2013;9:5641–5652. doi: 10.1021/ct400730n. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Stone J.E., Phillips J.C., Freddolino P.L., Hardy D.J., Trabuco L.G., Schulten K. Accelerating molecular modeling applications with graphics processors. J. Comput. Chem. 2007;28:2618–2640. doi: 10.1002/jcc.20829. [DOI] [PubMed] [Google Scholar]
- 53.Pritchard R.B., Hansen D.F. Characterising side chains in large proteins by protonless 13 C-detected NMR spectroscopy. Nat. Commun. 2019;10:1747. doi: 10.1038/s41467-019-09743-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Peterson L.X., Kang X., Kihara D. Assessment of protein side-chain conformation prediction methods in different residue environments. Proteins. 2014;82:1971–1984. doi: 10.1002/prot.24552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Hernandez Alvarez B., Skokowa J., Coles M., Mir P., Nasri M., Maksymenko K., Weidmann L., Rogers K.W., Welte K., Lupas A.N., et al. Design of novel granulopoietic proteins by topological rescaffolding. PLoS Biol. 2020;18:e3000919. doi: 10.1371/journal.pbio.3000919. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Skokowa J., Hernandez Alvarez B., Coles M., Ritter M., Nasri M., Haaf J., Aghaallaei N., Xu Y., Mir P., Krahl A.-C., et al. A topological refactoring design strategy yields highly stable granulopoietic proteins. Nat. Commun. 2022;13:2948. doi: 10.1038/s41467-022-30157-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Kocyła A., Pomorski A., Krężel A. Molar absorption coefficients and stability constants of Zincon metal complexes for determination of metal ions and bioinorganic applications. J. Inorg. Biochem. 2017;176:53–65. doi: 10.1016/j.jinorgbio.2017.08.006. [DOI] [PubMed] [Google Scholar]
- 58.Griessinger C.M., Maurer A., Kesenheimer C., Kehlbach R., Reischl G., Ehrlichmann W., Bukala D., Harant M., Cay F., Brück J., et al. 64Cu antibody-targeting of the T-cell receptor and subsequent internalization enables in vivo tracking of lymphocytes by PET. Proc. Natl. Acad. Sci. USA. 2015;112:1161–1166. doi: 10.1073/pnas.1418391112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
-
•
This paper analyzes existing, publicly available data. These accession numbers for the datasets are listed in the key resources table.
-
•
Damietta software is available at https://bio.mpg.de/damietta. An archival DOI is listed in the key resources table.
-
•
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.




