Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Jun 1.
Published in final edited form as: Chem Biol Drug Des. 2012 Mar 19;79(6):888–896. doi: 10.1111/j.1747-0285.2012.01356.x

Prediction of HIV-1 Protease/Inhibitor Affinity using RosettaLigand

Gordon Lemmon 1, Kristian Kaufmann 1, Jens Meiler 1,*
PMCID: PMC3342459  NIHMSID: NIHMS358363  PMID: 22321894

Abstract

Predicting HIV-1 protease/inhibitor binding affinity as the difference between the free energy of the inhibitor bound and unbound state remains difficult as the unbound state exists as an ensemble of conformations with various degrees of flap opening. We improve computational prediction of protease/inhibitor affinity by invoking the hypothesis that the free energy of the unbound state while difficult to predict is less sensitive to mutation. Thereby the HIV-1 protease/inhibitor binding affinity can be approximated with the free energy of the bound state alone. Bound state free energy can be predicted from comparative models of HIV-1 protease mutant/inhibitor complexes. Absolute binding energies are predicted with R=0.71 and SE=5.91 kJ/mol. Changes in binding free energy upon mutation can be predicted with R=0.85 and SE=4.49 kJ/mol. Resistance mutations that lower inhibitor binding affinity can thereby be recognized early in HIV-1 protease inhibitor development.

Keywords: Rosetta, ligand, docking, binding, energy, ensemble, flexible, rigid, model, predict

Introduction

The binding affinity of a drug to its protein target is defined by the free energy difference between the bound and unbound state. Mutation of the protein or chemical modification of the ligand can alter this energy difference directly – i.e. by adding or subtracting interactions between the two partners – or indirectly – i.e. by stabilizing or destabilizing protein or small molecule in either bound or unbound conformation (1). For the unbound state often ensembles of protein and small molecule need to be considered (2) while the bound state is often considerably more rigid. HIV-1 protease (PR) interaction with its inhibitors is a model case for this scenario while examples for the opposite scenario – rigid protein increases flexibility upon binding – are also known (3, 4).

Current computational methods are capable of predicting direct effects reasonably well through an analysis of all interactions between protein and ligand. However, the same methods often fail to predict indirect effects. For instance it remains difficult to predict how mutations outside the binding pocket are propagated throughout the protein and to the binding site (5). These indirect effects are likely to have greater destabilizing influence on a rigid-bound state then on a flexible unbound state.

We hypothesize that in the scenario of a rigid bound and flexible unbound state, prediction accuracy of indirect effects on binding affinity can be improved through a simple approximation. Figure 1 summarizes the effects of mutations on binding free energy in two scenarios: The top row represents the scenario wherein the unbound state exists as one stable low energy conformation. The bottom row represents the rugged energy landscape (jagged red line) of a flexible unbound state with multiple energetic minima. In a thought experiment we compare a binding site mutation that is assumed to interfere only with direct interactions between ligand and protein with a non-binding site mutation that is assumed to only affect stability of the protein, but does not change the protein-ligand interaction. In reality combinations of these two scenarios exist.

Figure 1.

Figure 1

Effects of mutations within or outside the binding site on binding affinity are compared for two scenarios: a rigid unbound state remains rigid upon ligand binding (A–C) and a flexible unbound state rigidifies upon ligand binding (D–F). A wildtype scenario (A,D) is compared with a binding site mutation affecting only the interaction with the ligand (B,E) and a non-binding site mutation affecting only the stability of the protein (C,F). Red lines represent energy landscapes for unbound protein. Blue lines represent energy landscapes for the protein in complex with the small molecule. For discussion see text.

In the first scenario – a rigid unbound state engages the ligand and remains rigid, a mutation within the binding site that disrupts protein-small molecule interactions will lower the binding affinity (Figure 1B). A mutation outside the binding pocket would have an equal effect on the free energy of bound and unbound conformation as they are identical. As a results the ligand affinity is unaltered (Figure 1C). In the case of a flexible unbound state, mutations inside the binding pocket that interrupt protein-ligand interactions would again be expected to lower binding affinity (Figure 1E). However, mutations outside the binding pocket are expected to have a greater destabilizing effect on the single rigid bound conformation than on the unbound state which consists of an ensemble of structures. While mutations which affect low-energy structures that contribute to the unbound state will certainly affect the overall free energy of the unbound state. However, we hypothesize that this effect is small as mutations will affect only a fraction of the low-energy conformations the unbound state can assume. If the ensemble is large enough, influence on free energy will be small. This hypothesis suggests that the free energy of the unbound state can be approximated with a constant in this scenario. The result of this difference is a net change in binding energy due to mutation outside the binding pocket (Figure 1F). It is obvious that this approximation is only valid for proteins that are very flexible in the unbound state and convert to a rigid bound conformation. HIV-1 PR is an example.

HIV-1 PR is a homodimer with a flexible binding site (Figure 2). Over 200 high resolution crystal structures of HIV-1 PR mutants in complex with HIV-1 PR inhibitors (PIs) are deposited in the protein databank (PDB, resolution better than 2.0 Å) (6). These mutants exhibit limited structural diversity verifying the well-defined rigid bound conformation of the protein (7). However, the two flap regions exhibit up to 7Å of movement in the unbound state (Figure 2) (8, 9). The unbound state is therefore best described as a large ensemble of structures (10). We hypothesize that it is for this reason that PR/PI docking studies have had difficulty predicting binding free energy (ΔΔGs). The free energy of the unbound state (ΔGu) is not accurately reflected by a single structure or a tight ensemble.

Figure 2.

Figure 2

Left: HIV-1 PR homodimer with acetylpepstatin bound. The two chains are colored “wheat” and “pale-green”. Binding site residues are colored red. Colored by atom is acetylpepstatin, an HIV-1 PI. Right: HIV-1 PR loops exhibit large movements upon ligand binding. One chain of HIV-1 PR is shown in several conformations. Green: 1TW7 (wide-open), Cyan: 3BC4 (open), Purple: 2NMZ (closed). A distance of 6.3 Å exists between open and closed loop conformations. (Distance is calculated between Cα atoms of residue Ile 50).

Cheng et al. assessed 16 scoring functions utilized in protein/ligand docking (11) for prediction of PR/PI ΔΔGs. Correlation coefficients ranged from R=0.17 to R=0.34. RosettaLigand predicted ΔΔGs with a correlation of R=0.41 (12). AutoDock predictions correlated with R=0.38 on a set of 25 HIV-1 PR/PI structures from the PDB, with binding data available (13).

At the same time HIV PI therapies are greatly hampered by drug resistance mutations. Only recently, conformational ensembles were used to assist in designing PIs with broad enough specificity to avoid escape mutations (14). The authors of this study evaluated chemical modifications to known PIs using electrostatic charge optimization. They chose not to include induced-fit effects or ligand flexibility.

In this study we use RosettaLigand to predict the effect of PR mutations inside and outside the binding pocket. Predicted ΔΔGs are compared with experimentally determined ΔΔGs. These include 34 HIV-1 PR mutants and eleven PIs. We demonstrate that by assuming the unbound state constant with respect to mutation we can achieve a correlation coefficient of R=0.71 over a wide array of PR/PI ΔΔG data. Improved prediction of PR/PI binding affinity may help clinicians select the optimal PI for treatment and help design PIs with broad specificity that avoid resistance mutations.

Materials and Methods

176 experimental PR/PI binding energies have been collected

PR/PI binding energies (ΔΔGs) were obtained from the Binding Database (www.bindingdb.org) (15). These 176 binding energies include experimental conditions and HIV-1 PR mutant sequence information, but lack structural information. They include a total of eleven distinct PIs and 34 distinct PR sequences. 106 of these datapoints resulted from isothermal titration calorimetry (ITC) measurements. The remaining 70 datapoints are enzyme inhibition constants (Kis).

These Kis were converted to binding energies using the equation ΔG = RT ln Ki, where R is the gas constant, 8.314 J K−1mol−1, and T is temperature in Kelvin. Ki values before and after conversion are summarized in Table S1. Since temperatures were rarely reported, we assumed 25°C (298K) for the conversion.

171 high resolution template PR structures have been collected

171 crystal structures of HIV-1 PR bound to various ligands were obtained from the PDB. These structures each have resolution better than 2.0 Å. PDB codes, resolution, bound ligands, and citations for all 171 of these structures are listed in Table S2. A multiple sequence alignment of these 171 structures is given as Figure S1.

Threading of sequence onto structure for comparative modeling

34 distinct sequences were associated with the 176 experimental PR/PI binding energy data points. The 3-letter residue codes found in each of the 171 backbones were replaced with 3-letter residue codes for each of the 34 sequences, thus generating 5,814 models. Missing side-chain coordinates were constructed using Rosetta:

High resolution refinement of comparative models

Rosetta’s high-resolution refinement protocol searches for low-energy structures in the conformational vicinity of the starting model (16, 17). Backbone torsion angles are perturbed. Next side-chain rotamers are optimized (18). Finally backbone and side-chain torsion angles are adjusted using a gradient-based energy minimization. This process is repeated multiple times, using a Monte Carlo accept/reject criterion (19).

Low resolution initial placement of ligand

After a structural alignment was used to superimpose all comparative models, ligands were placed in the binding pockets of these models according to their positions in homologous crystal structures. Next 1,000 placements of the ligand were sampled to find a starting pose that has acceptable attractive and repulsive scores. A soft repulsive energy term was used during initial ligand placement (12).

Docking of PIs into comparative models

Six cycles of side-chain rotamer sampling were coupled with small (0.1 Å, 0.05 radians) ligand movements. Each cycle included minimization of ligand torsion angles with harmonic constraints (where 0.05 radians of movement is equal to one standard deviation). Each ligand torsion angle has a constraint score which is calculated as: f(x)= (x−x0)/(standard deviation). Amino acid side chains were repacked using a backbone-dependent rotamer library (20). During a final minimization, backbone torsion angles were optimized with harmonic constraints on the Cα atom positions (0.2 Å standard deviation). Each C-alpha atom has a constraint score which is calculated as: f(x)= (x−x0)/(standard deviation).

The RosettaLigand standard scoring function with hard repulsive forces was used during the final minimization step. Score terms include the 6–12 Lennard-Jones potential (21), the Lazaridis-Karplus solvation model (22), a side-chain rotamer score, based on the Dunbrack rotamer set (20), a pair potential based on the probability of seeing two amino acids close together in space (23), and an explicit orientation hydrogen bonding model (24).

All computation was performed on the Vanderbilt University ACCRE cluster (www.accre.vanderbilt.edu). Rosetta revision 32372 was used for all calculations. Command line arguments and input options are given in the Supporting Information.

Predicting ΔΔGs using the standard approach

The standard approach calculates ΔΔGs as the difference between the free energy of a docked model (ΔGb) and the free energy of the unbound model with equivalent sequence (ΔGu) after energy minimization. This setup corresponds to Figure 1A–C wherein the unbound state and bound state free energies are equally susceptible to disruption by mutation (Eq. I). For each of the 34 mutant PR sequences the lowest energy unbound comparative model was chosen to represent ΔGu. The lowest energy docked model for a given PR/PI pairing was chosen to represent ΔGb. The difference between these values was taken as a prediction of ΔΔG.

Predicting ΔΔGs using the constant-unbound approach

The constant-unbound approach corresponds to Figure 1D–F and calculates ΔΔG by assuming ΔGu to be unknown but invariant with mutation (Eq. II). The lowest energy docked model for a given PR/PI pairing was chosen to represent ΔGb.

ΔΔG=ΔGbΔGu [I]
ΔGbconst [II]

Predicting ΔΔΔG focuses on the influence of mutation on binding affinity

To determine how well RosettaLigand can predict changes in binding free energy (ΔΔΔG, see Figure 3) upon protein mutation i→j, pairs of predicted or experimental ΔΔGs sharing the same PI but different PR sequence were subtracted to obtain ΔΔΔGs (Eqs. III, IV). ΔΔΔGs predicted by Rosetta were compared with experimental ΔΔΔGs to obtain ΔΔΔG correlation. This strategy removes influences from the changes of the ligand thereby focusing on predicting the influence of mutations.

ΔΔΔG=ΔΔGiΔΔGj=(ΔGi,bΔGi,u)(ΔGj,bΔGj,u) [III]
ΔGi,bΔGj,b [IV]

Figure 3.

Figure 3

Explanation of ΔΔΔG. PR structures are represented by blue rectangles with circular binding sites. PI structures are represented as red circles. PR mutants each have unique binding sites, pictured here as either perfectly circular, or notched. Symbols: ΔG=free energy, ΔΔG=binding energy, ΔΔΔG=relative binding energy.

Optimization of RosettaLigand score term weights

The docking calculations performed so far were based on the original RosettaLigand scoring function ( 2006)(12) where the scoring term weights had been optimized across a set of diverse protein/ligand complexes. In the past it has been demonstrated that optimized scoring functions are needed to accurately predict free energies with Rosetta(25). Therefore an optimized weight set for PR/PI complexes was developed. Score term weights were optimized separately for standard binding affinity predictions and constant-unbound predictions. Score term weights were also optimized separately for ΔΔG predictions and ΔΔΔG predictions. Hence, a total of four optimized weight sets were produced (Table 1). First, docking results were filtered by taking the top 5% of models by total energy and the top model by interface energy. A leave-one-out cross-validation analysis was used to determine the weights that produce the strongest correlation with experimental data. A multiple linear regression was used to determine weights that optimize the correlation between experimental and predicted binding affinity. The weight set was then applied to predict binding affinity of the data-point left out. In a round robin scheme, each data point was left out. The correlation coefficients and standard deviations relate to the predictions made for these independent data points. The final optimal weight sets reported are averaged over all cross-validation experiments (Table 1). Weight optimization was implemented in Mathematica (26).

Table 1.

Score term weights which optimize correlation between Rosetta predictions of ΔΔG and 106 values determined using ITC. Standard deviations are shown.

Score Term Rosetta

Default

Weights
ΔΔG
ΔΔΔG
Standard

approach
Constant

Unbound
Standard

approach
Constant

Unbound



Bias N/A −36.0±0.38 −1.19±12.86 −3.67±0.01 −0.26±0.01
attractive 0.8 0.82±0.02 0.76±0.01 0.20±0.00 0.72±0.00
repulsive 0.4 −0.01±0.02 0.08±0.01 0.11±0.00 0.003±0.00
solvation 0.6 0.78±0.03 1.39±0.03 0.10±0.00 1.32±0.00
dunbrack 0.4 0.33±0.01 −0.25±0.01 0.28±0.00 −0.24±0.00
pair 0.8 0.92±0.06 −2.76±0.06 0.52±0.01 −2.47±0.01
hbond_lr_bb 2.0 0.98±0.04 −0.28±0.05 0.07±0.00 0.18±0.01
hbond_bb_sc 2.0 0.10±0.03 0.32±0.03 −0.13±0.00 0.36±0.00
hbond_sc 2.0 −0.40±0.04 0.19±0.04 1.11±0.01 0.27±0.00

“Attractive” and “repulsive” are derived from the Lennard-Jones potential(21), “solvation” comes from a Lazaridis-Karplus model(22), “dunbrack” is a side-chain rotamer score based on the Dunbrack rotamer set(20), “pair” is a potential based on the probability of seeing two amino acids close together in space(23), and “hbond” terms are based on an explicit orientation hydrogen bonding model(24). sc: side-chain, bb: backbone, lr: long-range.

Partitioning data by location of PR mutations

We partitioned the 34 sequences shown in Figure 4 into four distinct groups, based on the presence and location of “exceptional” mutations. Exceptional mutations are defined as amino acids that are uncommon or rare in a multiple sequence alignment – i.e. if 17 out of 34 sequences have an A in a position and the other 17 have a V, neither is an exceptional mutation. A sequence that has an S in the same position would be counted as an exceptional mutation A/V→S. Exceptional mutations were selected using ClustalW alignment software (gray boxed residues in Figure 4). The first group includes sequences with no exceptional mutations (sequences 4, 5, 22, and 26). The second group has only exception mutations within or near the binding site (red residues in Figure 2) and includes sequences 1, 8, 16, 19, 21, 24, 29, 30, and 33. The third group has only exceptional mutations outside the binding pocket and includes sequences 2, 3, 9, 11, 12, 23, 27, and 28. The fourth includes sequences that have exceptional mutations within and outside the binding site (sequences 6, 7, 10, 13, 14, 15, 17, 18, 20, 25, 31, 32, and 34).

Figure 4.

Figure 4

Multiple sequence alignment using ClustalX 2.1. 34 sequences were threaded onto each of 171 backbone templates. Aligned are the sequences from the 34 experimental binding energy datapoints. An astrix ("*") means that the residues or nucleotides in that column are identical in all sequences in the alignment. A colon (":") means that conserved substitutions have been observed. A period (".") means that semi-conserved substitutions are observed. Exceptional residues are colored gray. Positions enclosed in red boxes indicate residue positions with the potential to confer drug resistance (as suggested by Rhee et al. 2005) (33).

We also partitioned sequences based on whether exceptional mutations fell within or outside of the flexible flap region. We define this region as comprising residues 37–61 (27). By this definition, 24% of PR lies in the flap region. Sequences with only exceptional mutation in the flap region include sequences 19 and 24. Sequences with only exceptional non-flap mutations include 1–3, 8, 9, 11–18, 20, 21, 23, 25, 27–33. Sequences with exceptional mutations in and out of the flap region include 6, 7, 10, 20 and 34.

Results/Discussion

Assessment of uncertainty in experimental binding affinity data

As seen in table S1 for a few PR/PI pairs binding affinities have been determined multiple times. In these cases we use average values which reduces the total number of experimental ITC values from 106 to 99 while the total number of Ki datapoints is reduced from 70 to 62. We further use replicate data to estimate the accuracy of experimental values. The standard error for ITC replicates is 4.69 kJ/mol. The standard error for converted Ki replicates is 7.21 kJ/mol. We will use these numbers as estimates for the experimental uncertainty. As noted in the previous section, we assume a temperature of 25°C in order to convert Kis to ΔΔGs. This assumption introduces additional uncertainty for ΔΔGs calculated from Kis. nevertheless, the standard deviation between ΔΔG values converted from Ki data and matching ITC values is 1.07 kJ/mol, confirming the validity of the conversion.

Comparative models have been built for 176 PR/PI complexes with known binding energies

The 34 distinct mutant sequences found in our experimental data contained between 3 and 14 mutations per monomer to match the wild-type HIV-1 PR sequence (28). These 34 mutant sequences were aligned and mutations at residues known to confer drug resistance are highlighted in red boxes (Figure 4). Each of the 34 sequences was threaded onto the backbones of all 171 template structures yielding 5,814 comparative models. These 5,814 ligand free structures were relaxed 10 times each using the Rosetta energy function (see methods). These 58,140 relaxed structures served as starting structures for RosettaLigand docking simulations.

RosettaLigand docking protocol allows local flexibility

For each 176 experimentally determined PR/PI binding affinities, the 171 times 10 comparative models with matching sequence were docked with the respective ligand. A total of 300,960 unique input structures were used for ligand docking. Local induced-fit effects were considered through full PR and PI flexibility in the binding site: The RosettaLigand docking predictions allow ligand flexibility by minimizing ligand torsion angles. Backbone torsion angles near the PR/PI interface were also minimized.

For each input, the docking protocol was repeated 20 times. For each set of predictions for a given PR/PI datapoint, docking results were filtered by taking the top 5% of models by total energy and the top model by interface energy.. Figure S2 compares top scoring Rosetta models with experimental PR/PI complex structures from the PDB that share the same PI to confirm accuracy of the modeling procedure.

Usage of experimental data for weight optimization

RosettaLigand uses a scoring function that has been optimized to give optimal docking results for a wide variety of ligands (12). For accurate prediction of free energies the weights of the scoring function need to be adjusted (25). For the purposes of optimizing the RosettaLigand scoring function weights and then testing the predictive power, we split our experimental datapoints into two groups. The 99 datapoints acquired by ITC were used to optimize weights because of their higher accuracy. Score term weights were optimized using leave-one-out cross-validation using 98 datapoints to fit the weights and predicting the 99th (see Table 1). The 62 Ki values converted to ΔΔGs were used as a second independent test of the scoring function.

Analysis of optimized scores

The van der Waals attractive and solvation energies contribute most to an accurate prediction of free energy. Van der Waals attractive scores assess the shape complementarity of ligand and protein. The solvation score penalizes the burial of polar atoms not engaged in hydrogen bonds. Score terms that capture protein/ligand hydrogen bonding effects were also given a substantial weight. Hydrogen bonds can contribute substantially to binding affinity. Interestingly we find a significant negative weight for the amino acid pair potential. We attribute this negative weight to the fact that amino acid electrostatic interactions are disrupted in the PR binding site upon PI binding. Removal of the amino acid pair potential from the scoring function does however not result in significantly reduced prediction accuracy (data not shown).

Predicting ΔΔGs using the standard approach

The standard approach calculates ΔΔGs as the difference between the free energy of a docked model (ΔGb) and the free energy of the unbound model with equivalent sequence (ΔGu) (see methods). Score terms were reweighted to optimize predicted ΔΔG correlation with experimental data (weights are shown in Table 1, columns labeled “Standard Approach”). After reweighting, the predicted and experimental ΔΔGs correlate with R=0.40 (Figure 5A), while ΔΔΔGs correlate with R=0.47 (Figure 5C).

Figure 5.

Figure 5

Predicted/experimental correlation plots. (A–C) Experimental binding energy (ΔΔG) is plotted on the X-axis, predicted ΔΔG on the Y-axis. (D–F) ΔΔGs sharing the same ligand but different PR sequence were subtracted to produce ΔΔΔG values. Experimental ΔΔΔG is shown on the X-axis, predicted ΔΔΔG on the Y-axis. Note that since pairs of ligand matched ΔΔGs are used to derive ΔΔΔG values, there are many more of these values than of ΔΔGs.

Predicting ΔΔGs using the constant-unbound approach

The constant-unbound approach predicts ΔΔG as a function of ΔGb alone. Assuming constant free energy for unbound PR the ΔΔG and ΔΔΔG correlations improve to R=0.71 and R=0.85 (Figure 5B, D) after score term reweighting (Table 1, columns labeled “Constant Unbound”). The standard error of prediction is with 5.91 kJ/mol and 4.49 kJ/mol, respectively, in range of the experimental uncertainty (4.69 kJ/mol, Table 2). ΔΔΔG correlations reported above are calculated by subtracting ΔΔGs sharing the same PI but different PR sequence. ΔΔΔG correlations calculated by subtracting ΔΔGs sharing the same PR sequence but different PIs yield a correlation of R=0.61±0.04 with a standard error of 7.28 kJ/mol.

Table 2.

Pearson’s correlation (RP) Spearman’s rank correlation (RS) and standard errors (kJ/mol, kcal/mol) between RosettaLigand predictions and experimental data.

n ΔΔG n ΔΔΔG
RP RS kJ/mol kcal/mol RP RS kJ/mol kcal/mol
ITC data* Standard approach 99 0.38±0.09 0.51±0.09 7.82, 1.87 591 0.51±0.03 0.51±0.03 7.29, 1.74
Constant-unbound 99 0.71±0.05 0.69±0.05 5.91, 1.41 591 0.85±0.01 0.86±0.01 4.49, 1.07

Ki data Rosetta default weights 62 0.66±0.07 0.49±0.10 557.4, 133.22 327 0.61±0.04 0.47±0.04 23.7, 5.66
Optimized weights 62 0.70±0.07 0.40±0.11 7.22, 1.73 327 0.70±0.03 0.57±0.04 7.28 , 1.74
*

Correlation with ITC measurements after score term weight optimization (see table 1).

Correlation with ΔΔGs converted from Ki data. The constant-unbound approach was used.

Optimized score term weights predict binding affinity in independent data set

Optimized weight sets shown in Table 1 were generated from ITC data only. In order to show that high correlation statistics were not an artifact of leave-one-out weight optimization, optimized weights were applied to ΔΔG predictions for experimental Ki data. RosettaLigand predictions correlate well with the 62 ΔΔGs in this independent dataset (R=0.70, see Table 2). The standard error in our predictions is 7.22 kJ/mol which correlates with the previously determined experimental uncertainty for this dataset (7.21 kJ/mol).

Analysis of data partitioned by location of PR mutations

We partitioned the experimental data according to whether mutations were found in the binding site of HIV-1 PR or elsewhere. Averaging replicates reduces the total number of experimental ΔΔG values from 176 to 149. These data points were assigned to one of the four groups. Group one contained no exceptional mutations and included 15 datapoints. Group 2 included 17 datapoints with only mutations in the binding site. Group 3 includes 44 datapoints with only mutations outside the binding site. Group 4 includes 73 datapoints with mutations inside and outside the binding site. Corresponding Rosetta predictions were reweighted using the previously optimized weights (weights from Table 1, “constant-unbound”) and predicted ΔΔG within each group were compared with experimental values.

Standard errors between Rosetta predicted ΔΔG and experimental data are shown in Table 2. Note that the small and variable sample size makes correlation coefficients unsuitable for comparison. Generally, ΔΔΔG predictions outperform ΔΔG predictions. Further, predictions are most accurate for sequences with no mutations or only non-binding site mutations. Accuracy decreases as binding site mutations occur. While the latter effect exemplifies the larger influence of binding site mutations for affinity, the former data point confirms our hypothesis that assuming PR ΔGu to be invariant with respect to mutation allows for accurate prediction of effects of non-binding site mutations on PR/PI affinity.

We also partitioned data based on whether mutations were found in the flexible flap region (residues 37–61)(29). While our flap region definition comprised 24% of the protein, only 2 of the experimental data points contained only flap region mutations, 35 data points had mutations in flap and non-flap regions, and 97 data points contained only non-flap region mutations. It appears that predictions are more accurate for mutants that contain both, flap and non-flap mutations (Table S3). This finding supports our hypothesis that assuming PR ΔGu to be invariant with respect to mutation allows for accurate prediction of effects of non-binding site mutations on PR/PI affinity. The lack of only-flap region mutants complicates interpretation of this analysis.

Conclusion

Both, ΔΔG and ΔΔΔG predictions improve for PR/PI complexes using the constantunbound approach (to R=0.71 and R=0.85 respectively, after score term reweighting). This is expected since unbound HIV-1 PR exhibits a high degree of flexibility (10) and stabilizes upon ligand binding. Therefore the free energy of the unbound state is less sensitive to individual mutations. This result is significant because it demonstrates a simple way to improve binding free energy predictions for proteins with a flexible unbound state. By assuming differences in the unbound state of closely related structures are negligible, binding free energy prediction is possible considering the bound state of the protein only. This finding becomes even more important if one considers that a crystal structure of the unbound protein is often not available in such a scenario.

Clearly if it was possible to accurately predict the free energy of the unbound state, one could further improve binding affinity predictions. However, currently limited structural information is available to describe the conformational ensemble that represents unbound state of PR mutants.

As expected ΔΔΔG predictions outperform ΔΔG predictions. These relative binding energies focus on effects of mutations on the same ligand thereby removing the need to accurately predict differences in ΔΔG among PIs. Because Rosetta scoring terms have been parameterized for optimizing amino acid side chain placement, Rosetta excels at ΔΔΔG predictions.

Note that the standard approach that uses a single bound and unbound state resembles closely a lock-and-key paradigm with local induced fit in the biding site. The constant unbound approach resembles a conformational selection paradigm coupled with local induced fit in the biding site.

Future Directions

During docking we allowed backbone flexibility within the binding site. A future study may need to incorporate global backbone flexibility during docking, to allow mutations outside the binding site to affect the conformation of the binding site. The Rosetta database only includes de-protonated aspartic acid. In a study by Wittayanarakul et al. the protonation state of the catalytic aspartate residues at position 25 was important for more accurate binding free energy calculates (30).

Further, for several PIs, a water molecule mediates interaction with flap residues Ile-50 and Ile-50’, stabilizing PR in the closed conformation (31, 32). This water molecule is not modeled in the present study. However, given that both interactions are present in all PR/PI complexes cancellation of errors allows an accurate prediction of PR/PI affinity already with the setup presented here. A future direction would be to add protonated aspartate to the Rosetta residue type library and simultaneously optimize the positing of the PI and the bridging water molecule.

Supplementary Material

Supp Table S1-S3 & Fig S1-S2

Acknowledgements

Funding was provided through the Molecular Biophysics Training Grant at Vanderbilt University to G.L. and K.W.K. (NIH GM 08320) and an R01 grant to J.M. (NIH MH 090192).

Footnotes

Supporting Information

A supplemental document is available online. It contains experimental ΔΔG and Ki values used in this study; a description of each of the 171 template structures used for comparative modeling in this study correlations for partitioned by presence and location of exceptional mutations; a ClustalW multiple sequence alignment for the 171 template structures used in this study; images of Rosetta predictions superimposed on PDB structures; and a description of the options we used with Rosetta software.

References

  • 1.Shimotohno A, Oue S, Yano T, Kuramitsu S, Kagamiyama H. Demonstration of the importance and usefulness of manipulating non-active-site residues in protein design. J Biochem. 2001;129:943–948. doi: 10.1093/oxfordjournals.jbchem.a002941. [DOI] [PubMed] [Google Scholar]
  • 2.Henzler-Wildman K, Kern D. Dynamic personalities of proteins. Nature. 2007;450:964–972. doi: 10.1038/nature06522. [DOI] [PubMed] [Google Scholar]
  • 3.Martin SF. Preorganization in biological systems: Are conformational constraints worth the energy? Pure Applied Chemistry. 2007;79:193–200. [Google Scholar]
  • 4.Gohlke H, Kuhn LA, Case DA. Change in protein flexibility upon complex formation: analysis of Ras-Raf using molecular dynamics and a molecular framework approach. Proteins. 2004;56:322–337. doi: 10.1002/prot.20116. [DOI] [PubMed] [Google Scholar]
  • 5.Sousa SF, Fernandes PA, Ramos MJ. Protein-ligand docking: current status and future challenges. Proteins. 2006;65:15–26. doi: 10.1002/prot.21082. [DOI] [PubMed] [Google Scholar]
  • 6.Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat Struct Biol. 2003;10:980. doi: 10.1038/nsb1203-980. [DOI] [PubMed] [Google Scholar]
  • 7.Louis JM, Ishima R, Torchia DA, Weber IT. HIV-1 protease: structure, dynamics, and inhibition. Adv Pharmacol. 2007;55:261–298. doi: 10.1016/S1054-3589(07)55008-8. [DOI] [PubMed] [Google Scholar]
  • 8.Miller M, Schneider J, Sathyanarayana BK, Toth MV, Marshall GR, Clawson L, et al. Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. Science. 1989;246:1149–1152. doi: 10.1126/science.2686029. [DOI] [PubMed] [Google Scholar]
  • 9.Galiano L, Bonora M, Fanucci GE. Interflap distances in HIV-1 protease determined by pulsed EPR measurements. J Am Chem Soc. 2007;129:11004–11005. doi: 10.1021/ja073684k. [DOI] [PubMed] [Google Scholar]
  • 10.Ding F, Layten M, Simmerling C. Solution structure of HIV-1 protease flaps probed by comparison of molecular dynamics simulation ensembles and EPR experiments. J Am Chem Soc. 2008;130:7184–7185. doi: 10.1021/ja800893d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cheng T, Li X, Li Y, Liu Z, Wang R. Comparative assessment of scoring functions on a diverse test set. J Chem Inf Model. 2009;49:1079–1093. doi: 10.1021/ci9000053. [DOI] [PubMed] [Google Scholar]
  • 12.Meiler J, Baker D. ROSETTALIGAND: protein-small molecule docking with full side-chain flexibility. Proteins. 2006;65:538–548. doi: 10.1002/prot.21086. [DOI] [PubMed] [Google Scholar]
  • 13.Jenwitheesuk E, Samudrala R. Improved prediction of HIV-1 protease-inhibitor binding energies by molecular dynamics simulations. BMC Struct Biol. 2003;3:2. doi: 10.1186/1472-6807-3-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sherman W, Tidor B. Novel method for probing the specificity binding profile of ligands: applications to HIV protease. Chem Biol Drug Des. 2008;71:387–407. doi: 10.1111/j.1747-0285.2008.00659.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen X, Liu M, Gilson MK. BindingDB: a web-accessible molecular recognition database. Comb Chem High Throughput Screen. 2001;4:719–725. doi: 10.2174/1386207013330670. [DOI] [PubMed] [Google Scholar]
  • 16.Misura KM, Baker D. Progress and challenges in high-resolution refinement of protein structure models. Proteins. 2005;59:15–29. doi: 10.1002/prot.20376. [DOI] [PubMed] [Google Scholar]
  • 17.Bradley P, Misura KM, Baker D. Toward high-resolution de novo structure prediction for small proteins. Science. 2005;309:1868–1871. doi: 10.1126/science.1113801. [DOI] [PubMed] [Google Scholar]
  • 18.Dunbrack RL, Jr, Cohen FE. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci. 1997;6:1661–1681. doi: 10.1002/pro.5560060807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Li Z, Scheraga HA. Monte Carlo-minimization approach to the multiple-minima problem in protein folding. Proc Natl Acad Sci U S A. 1987;84:6611–6615. doi: 10.1073/pnas.84.19.6611. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Dunbrack RL, Jr, Karplus M. Backbone-dependent rotamer library for proteins. Application to side-chain prediction. J Mol Biol. 1993;230:543–574. doi: 10.1006/jmbi.1993.1170. [DOI] [PubMed] [Google Scholar]
  • 21.Lennard-Jones JE. On the Determination of Molecular Fields. II. From the Equation of State of a Gas. Proceedings of the Royal Society A. 1924;106:463–477. [Google Scholar]
  • 22.Lazaridis T, Karplus M. Effective energy function for proteins in solution. Proteins. 1999;35:133–152. doi: 10.1002/(sici)1097-0134(19990501)35:2<133::aid-prot1>3.0.co;2-n. [DOI] [PubMed] [Google Scholar]
  • 23.Simons KT, Ruczinski I, Kooperberg C, Fox BA, Bystroff C, Baker D. Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins. Proteins. 1999;34:82–95. doi: 10.1002/(sici)1097-0134(19990101)34:1<82::aid-prot7>3.0.co;2-a. [DOI] [PubMed] [Google Scholar]
  • 24.Kortemme T, Morozov AV, Baker D. An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein-protein complexes. J Mol Biol. 2003;326:1239–1259. doi: 10.1016/s0022-2836(03)00021-4. [DOI] [PubMed] [Google Scholar]
  • 25.Kortemme T, Baker D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc Natl Acad Sci U S A. 2002;99:14116–14121. doi: 10.1073/pnas.202485799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wolfram Research I. Mathematica Edition: Version 8.0. Champaign, Illinois: Wolfram Research, Inc.; 2010. [Google Scholar]
  • 27.Torbeev VY, Raghuraman H, Hamelberg D, Tonelli M, Westler WM, Perozo E, et al. Protein conformational dynamics in the mechanism of HIV-1 protease catalysis. Proc Natl Acad Sci U S A. 2011;108:20982–20987. doi: 10.1073/pnas.1111202108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ratner L, Haseltine W, Patarca R, Livak KJ, Starcich B, Josephs SF, et al. Complete nucleotide sequence of the AIDS virus, HTLV-III. Nature. 1985;313:277–284. doi: 10.1038/313277a0. [DOI] [PubMed] [Google Scholar]
  • 29.Hornak V, Okur A, Rizzo RC, Simmerling C. HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations. Proc Natl Acad Sci U S A. 2006;103:915–920. doi: 10.1073/pnas.0508452103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wittayanarakul K, Hannongbua S, Feig M. Accurate prediction of protonation state as a prerequisite for reliable MM-PB(GB)SA binding free energy calculations of HIV-1 protease inhibitors. Journal of Computational Chemistry. 2008;29:673–685. doi: 10.1002/jcc.20821. [DOI] [PubMed] [Google Scholar]
  • 31.Wlodawer A, Erickson JW. Structure-based inhibitors of HIV-1 protease. Annu Rev Biochem. 1993;62:543–585. doi: 10.1146/annurev.bi.62.070193.002551. [DOI] [PubMed] [Google Scholar]
  • 32.Wlodawer A, Vondrasek J. Inhibitors of HIV-1 protease: a major success of structure-assisted drug design. Annu Rev Biophys Biomol Struct. 1998;27:249–284. doi: 10.1146/annurev.biophys.27.1.249. [DOI] [PubMed] [Google Scholar]
  • 33.Rhee SY, Fessel WJ, Zolopa AR, Hurley L, Liu T, Taylor J, et al. HIV-1 protease and reverse-transcriptase mutations: Correlations with antiretroviral therapy in subtype B isolates and implications for drug-resistance surveillance. Journal of Infectious Diseases. 2005;192:456–465. doi: 10.1086/431601. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supp Table S1-S3 & Fig S1-S2

RESOURCES