IDSite: An accurate approach to predict P450-mediated drug metabolism

Jianing Li; Severin T Schneebeli; Joseph Bylund; Ramy Farid; Richard A Friesner

doi:10.1021/ct200462q

. Author manuscript; available in PMC: 2012 Nov 8.

Published in final edited form as: J Chem Theory Comput. 2011 Nov 8;7(11):3829–3845. doi: 10.1021/ct200462q

IDSite: An accurate approach to predict P450-mediated drug metabolism

Jianing Li ¹, Severin T Schneebeli ¹, Joseph Bylund ¹, Ramy Farid ², Richard A Friesner ^1,^*

PMCID: PMC3254112 NIHMSID: NIHMS324948 PMID: 22247702

Abstract

Accurate prediction of drug metabolism is crucial for drug design. Since a large majority of drugs metabolism involves P450 enzymes, we herein describe a computational approach, IDSite, to predict P450-mediated drug metabolism. To model induced-fit effects, IDSite samples the conformational space with flexible docking in Glide followed by two refinement stages using the Protein Local Optimization Program (PLOP). Sites of metabolism (SOMs) are predicted according to a physical-based score that evaluates the potential of atoms to react with the catalytic iron center. As a preliminary test, we present in this paper the prediction of hydroxylation and O-dealkylation sites mediated by CYP2D6 using two different models: a physical-based simulation model, and a modification of this model in which a small number of parameters are fit to a training set. Without fitting any parameters to experimental data, the Physical IDSite scoring recovers 83% of the experimental observations for 56 compounds with a very low false positive rate. With only 4 fitted parameters, the Fitted IDSite was trained with the subset of 36 compounds and successfully applied to the other 20 compounds, recovering 94% of the experimental observations with high sensitivity and specificity for both sets.

Introduction

It is crucial to understand how potential drugs are metabolized in the body, because human metabolism has profound impacts on the bioactivity and the safety profiles of drug candidates. On one hand, metabolism can convert these compounds into their active forms, which interact with the therapeutic targets; on the other hand, metabolism eliminates the compounds by converting them into inactive excretable metabolites. Sometimes the metabolic modifications also lead to toxicity, which can cause unexpected failures in the later phases of drug development. Furthermore, the metabolic behavior of drug compounds is also highly related to other critical issues such as food-drug interactions, drug-drug interactions, and personalized medication.^1–3 Given the enormous impact of metabolism on drug bioavailability and toxicity, it is important to determine metabolites in the early stage of the drug discovery process. However, to obtain such information experimentally is often a very lengthy and expensive process. Therefore, it would be extremely useful if one could use computational methods to predict the metabolic decomposition of drug candidates.

Since cytochrome P450 enzymes (CYP) are involved in a large majority of drug metabolism pathways, many computational studies have been published attempting to predict P450-mediated metabolism using a variety of methods and models. For a recent review see the work of Afzelius et al.⁴ These previous studies mainly focused on the important P450 isoforms 2D6, 2C9 and 3A4 aiming to predict the primary metabolites of drug compounds. Several ligand-based methods have been developed during the past decade, making predictions based on hydrogen abstraction energies estimated with semiempirical quantum mechanics⁵ or DFT methods⁶. Although such ligand-based methods are very fast, it is often necessary to consider the interaction between the enzyme and the substrate in order to reach high accuracy (for example, >80% agreement with experiments) in the predictions. It is possible to include a limited amount of enzyme specific information by making descriptors of ligand-based models dependent on the nature of the enzyme.^7–9 Such approaches have been successfully implemented in software packages such as MetaSite and some were reported to recover up to 86% of the experimental observations.¹⁰ On the other hand, molecular dynamics (MD) or induced-fit docking simulations in combination with transition state calculations at the QM/MM or semiempirical quantum level were used to predict metabolites for a few ligands.^11,12 Other promising methods based on molecular docking have been implemented as well,^13–17 which determine the predictions using a reactivity model and/or distance cutoffs from the reactive iron center.

Traditional empirical ligand-based approaches to the prediction of P450 SOMs rely primarily on implicit estimation of intrinsic site reactivity to the Compound I oxo species, coupled with a heuristic attempt to take into account the ability of the ligand to bind to the P450 active site. While such methods can yield some discrimination of true positives from false positives when a sufficiently large training set is employed,^5,8,18 the precision of the approach is fundamentally limited, as the treatment of protein-ligand binding is highly approximate. Methods such as MetaSite¹⁰ provide some incorporation of P450 structural information, but employ a much smaller training set and fewer empirical parameters; the overall results appear to actually be less accurate than a ligand-based approach employing an extensive data set. The problem is again that the MetaSite algorithm for modeling the reactive protein-ligand complex does not rigorously evaluate the binding energy, or perform a thorough conformational search, severely limiting the predictive capability that can be attained.

The method described in the present work (IDSite) represents a qualitatively different approach from those discussed above, as well as from other efforts in the literature.^6,8–10 Firstly, the goal is to actually generate an accurate structure for the protein-ligand complex that enables reactivity at a specified site; this requires construction of a good approximation to a transition state structure for both aliphatic and aromatic sites of reaction. Secondly, the relative binding affinity, as compared to alternative structures for both the site in question and for other sites, has to be computed with a respectable degree of precision, on the order of a few kcal/mol. Finally, The relative intrinsic barrier height of the reaction (combined with the relative binding affinity to produce an overall relative barrier, as compared to other possible reactions of the molecule, must be estimated, to within ~1 kcal/mol. These are extraordinarily daunting tasks, given that the P450 isoforms present large, complex active site regions with substantial capability for induced-fit conformational changes, a necessary condition for them to accommodate the wide range of exogenous ligands with which they need to interact to perform their biological function.

The algorithms in IDSite employ a novel model for the total energy of the protein-ligand complex, which has recently been show to provide remarkably accurate predictions for side chains and loops¹⁹, and a sophisticated algorithm for generating converged induced-fit structures which combines docking, conformational search, and hybrid Monte Carlo (MC) methods based on MD trajectories. The algorithm enables a hierarchical search which addresses the various length scales of the problem, including the small correlated motions provided by the MD trajectories which we have found are absolutely necessary to produce useful rank ordering of structures, particularly for larger ligands. Constraints are employed in conjunction with these simulation algorithms to enforce appropriate transition state structures. The energy model enables the targets of a few kcal/mol accurate in relative binding affinity to be reached. Finally, a quantum chemically based model is employed to calculate relative intrinsic reactivities, and again is shown below to yield outstanding performance. Based on such high level of success, it is documented below in predicting true positive SOMs vs. false positives.

To our knowledge, these results represent the first reliable and accurate computation of binding poses and transition states for a wide range of drug-like molecules interacting with an important human P450 isoform. There are a few previous papers in which structures are generated via QM/MM calculations,^12,20,21 however these typically address a very small number of ligands (usually one), the ligands are typically simpler and smaller than those treated here, and the sampling algorithms are much less extensive. We believe that these structures can be very useful in practical drug design applications, in situations where modification of P450 metabolic properties for candidates in later stages of lead optimization is required. The availability of an atomic level three dimensional structure, as well as the ability to predict the structural and energetic effects of chemical modification of the molecule, provides a new tool for chemists to rationally engineer desirable metabolic properties into clinical candidates. Extension of our methods to other P450 isoforms such as 2C9, 3A4 and 1A2, which is currently in progress, will enhance the utility of our approach for this important application.

Methods and Materials

IDSite methodology

IDSite combines the docking program Glide²² and the protein structure modeling program PLOP (Protein Local Optimization Program, available as the protein refinement module in the protein modeling package Prime of Schrödinger, Inc.²³), to model induced-fit effects and to predict sites of metabolism. IDSite consists of three hierarchical sampling stages and one final scoring stage (Figure 1). It begins with flexible Glide docking calculations, which place the ligand into the active site. Following the docking stage, two refinement stages in PLOP are carried out to refine the protein side-chain and ligand orientations. At the end of each sampling stage, the generated/refined poses are screened based on their structures and energies, and clustered according to the similarity of the ligand conformation. Finally, the refined lowest energy poses are used to predict the sites of metabolism based on a physical score, which is dependent on the energies of the poses as well as the intrinsic chemical reactivities of the potential sites of metabolism.

IDSite is able to use knowledge about specific conserved interactions to perform efficient sampling and accelerate the calculations. For example in the case of CYP2D6, a typical substrate always contains a basic center (e.g. an amine nitrogen) that binds to one of the two acidic residues, Glu216 or Asp301. IDSite constrains such salt bridges to reduce the sampling cost associated with the docking and refinement stages. Filters are applied during the screening at the end of each stage in IDSite to reduce the number of poses passed to further refinement or evaluation (Table 1). The following is a detailed description of each stage of IDSite.

Table 1.

IDSite filters in the screening for CYP2D6

Stage	Filters applied in the screening at the end of the stage
Glide Docking	Poses that fulfill any of the criteria below are removed: The distance of the basic nitrogen to the ferryl oxygen is less than 5.0 Å; The distance of the basic nitrogen to the negative charged oxygen (in Glu216 or Asp301) is greater than 5.5 Å; More than 2 heavy atoms from the ligands are further than 16.0 Å away from the heme iron; More than 1 heavy atom from the ligand are closer than 1.0 Å to the receptor; More than 6 heavy atoms from the ligand are closer than 1.8 Å to the receptor; No heavy atom in the ligand is within 5.0 Å to the heme iron.
PLOP Refinement 1
	For PLOP Refinement 1: All the poses are ranked with PLOP energies. Poses with energy higher than 35 kcal/mol compared to the lowest energy pose are removed.

PLOP Refinement 2	Poses that fulfill any of the criteria below are removed: the distance between the constrained atom and the ferryl oxygen is outside the optimal range which is from 1.65 to 2.60 Å for sp³ atoms and from 1.60 to 2.08 Å for sp² atoms; the distance of the basic nitrogen to the ferryl oxygen is less than 4.8 Å; the distance of any polar atom to the ferryl oxygen is less than 3.2 Å; the distance of the constrained salt bridge (between the basic nitrogen and the oxygen from Glu216 or Asp301) is greater than 3.6 Å; the angle of the salt bridge (N-H-O) is less than 140 degree; more than 2 heavy atoms from the ligands are either further than 14.5 Å or closer than 1.6 Å from the heme iron; The pose has at least 1 distorted cyclohexane ring.

Open in a new tab

We have constructed our sampling and scoring algorithms with the intention of approximating the correct transition state structure of the protein-ligand complex and associated activation energy which would lead to reactivity of the target atom of the ligand. There are two components of the problem: finding the transition state in reasonable CPU time (a daunting task for a large, complex ligand when induced fit effects are important), and estimating the free energy of activation associated with the transition state. The VSGB 2.0 energy function, with constraints to enforce a suitable geometry for the reaction to take place (and some other constraints as well to facilitate sampling, as described in the text below), is minimized to generate these structures for the various possible candidate reactant heavy atoms. We use the classical force field and solvation model to produce a “reactant” structure which is optimally positioned for the targeted chemical reactivity. The activation barrier from a precomputed quantum chemical fragment calculation, as described in the following text, is then added to the VSGB 2.0 energy to estimate the relative energy barrier for converting such a structure into products. This is an approximation to a more rigorous approach such as using QM/MM methods to generate the reactant, transition state, and product structures. Note that it is only important that relative free energy of the various potential sites of reaction are calculated with reasonable accuracy, as the most reactive (lowest activation free energy) site is always used as a reference point (i.e. the energy function for this site is subtracted from the energy function for the candidate site) in our assessment of the metabolic contribution of each site. Finally, in applying the above protocol, the VSGB 2.0 energy must be calculated using a structure with the constraints in place, otherwise the structure would minimize to something that is not a suitable starting point for reaction. The constraints introduce some strain energy into the structure, but this strain energy is an appropriate component of the activation free energy as it does cost energy to create a suitable reactive structure.

1) Glide docking

Starting from the ligand and the protein receptor structures, IDSite carries out flexible ligand docking with Glide.^24,25 The flexible ligand docking protocol generates a large number of ligand conformations that are then docked into the rigid receptor. The first step in Glide docking is to define the binding box and calculate the receptor grid. As in Glide, in IDSite the binding site is defined as a box centered at the center of selected residues or a ligand (if the structure contains a ligand). Because we start from the apo structure of CYP2D6 (PDBID: 2F9Q. See below for details about the protein preparation), the center of the binding box is selected as the centroid of the residues Glu216, Asp301, Thr309, and Phe483. The box dimension on each side is set to 10 Å for the inner box and 20 Å for the outer box. After the grid generation, IDSite samples the conformations of freely rotatable bonds and rings with Glide Standard Precision (SP). In order to increase sampling, IDSite uses reduced van der Waals (VDW) radii and skips the default filtering with a rough score within Glide (also referred to as expanded sampling). Similar poses are clustered according to their RMSD (cutoff 2.0 Å). Finally, a post-docking minimization is performed and the top 60 minimized poses according to the Glide SP score are retained. These poses are then screened to remove the poses with obvious steric clashes, with too many atoms outside the inner binding box, or without atoms close to the heme iron (Table 1). The remaining poses are then passed to the first refinement stage.

IDSite uses reduced VDW radii for nonpolar atoms both in the protein receptor and the ligand, so that slight steric clashes are tolerated during the docking stage. For the protein receptor the VDW scaling factor is fixed at 0.40, while for the ligand, the scaling factor starting from 0.80 is adaptively adjusted until at least 4 valid poses are found. With highly flexible ligands and relatively high scaling factors, Glide often finds only a handful of valid poses, and even fewer survive after IDSite screening. However, if the scaling factor is set too low, the docked poses may contain too many serious steric clashes, which can cause problems in the subsequent minimization. If IDSite fails to find enough valid poses, the scaling factor is adjusted and the number of poses to pass the initial docking phase in Glide is increased accordingly to augment sampling.

Since a typical CYP2D6 substrate forms a highly conserved salt bridge with either Glu216 or Asp301,²⁶ IDSite employs this conserved interaction to reduce the sampling cost of the CYP2D6-docking in the following way: IDSite adds a positional constraint to ensure that the generated poses fulfill at least part of the preferred conserved interactions. The positional constraint defines a spherical region in the receptor that is within 4.0 Å of the center of the Glu216, Asp301, and Ser304 residues (Figure 2). It is required that during docking and post-docking minimization each pose should maintain at least one hydrogen-bond donor inside the spherical region. If the ligand contains other hydrogen-bond donors except for the basic nitrogen, the constrained docking is likely to generate poses that form hydrogen bonds instead of the salt bridge to Glu216 or Asp301. However, IDSite is able to distinguish these poses and filter them via an additional salt bridge filter in the pose screening (Table 1), so that only the poses with a stable salt bridge are allowed to pass to the refinement stage.

Definition of the binding box (yellow cube) and the positional constraint (yellow dotted sphere) in IDSite for CYP2D6.

2) PLOP refinements

The refinement of the docked poses includes multiple parallel Monte Carlo Minimization (MCM) simulations in PLOP. For each pose from the previous stage (the docking or first refinement stage), IDSite finds all the heavy atoms in the ligand close to the heme iron. For each of these atoms, distance and angular harmonic constraints are applied in order to force sampling of the conformations that potentially lead to metabolism. The optimal distances and angles of the constraints were obtained from hydroxylation transition state geometries with a heme model system at the B3LYP/LACVP* level using Jaguar.²⁷ The detailed nature of the employed constraints is shown for both sp³ and sp² type carbons in Figure 3 and 4. The constraints are then employed in the minimizations step but were not included in the energy used for the acceptance step of the MCM simulations. PLOP uses the overlap factor (the ratio of distance between two atoms centers to the sum of their atomic radii) to quickly reject randomized structures with serious steric clashes (defined as the overlap factor being lower than a specific cutoff). PLOP repeats the random attempts until a structure with tolerable clashes is generated, after which a constrained minimization using the truncated Newton method is performed. The acceptance or rejection of the minimized structure is decided by the Metropolis criteria based on the energy calculated in the VSGB 2.0 model. (Performing the minimization step before testing the acceptance criteria violates detail balance, but this is not an issue as we are interested only in low energy structures and not the population/ensemble distribution.) The simulations run until a certain number of accepted structures are collected.

Constraints applied to the heme region in the *first* refinement stage. The ferryl oxygen is a “dummy” atom (1.6 Å above the heme iron), only used to define the constraints in the IDSite calculations. **(a)** Constraints for sp³ carbons. **(b)** Constraints for sp² carbons.

Constraints applied to the heme region in the *second* refinement stage. The ferryl oxygen is a “dummy” atom (1.6 Å above the heme iron), only used to define the constraints in the IDSite calculations. **(a)** Constraints for sp³ carbons. **(b)** Constraints for sp² carbons.

In order to sample the various degrees of freedom in the conformational space, IDSite employs three types of randomized moves in the MCM simulations: side-chain rotation, rigid body translation/rotation, and hybrid moves.

Side-chain moves

By varying the dihedral angles of the rotatable bonds, IDSite uses side chain MC moves in PLOP to sample the selected side-chain conformations of the protein and of the ligand. Up to three close residues (C_β distance within 6 Å) are allowed to rotate collectively, but the moves of the protein residues and those of the ligand are separated. In each attempted movement, the conformations of the selected side chains (from the protein/ligand) are either changed by random perturbations or assigned by the randomly selected rotamers from a library. For an attempt with a random perturbation, the displacement of each dihedral angle is the sum of a large rotation (N times 60 degrees with N as a random integer between 0 and 5) and a random perturbation from 0 to 30 degrees. For a rotamer library attempt, a side-chain conformation is updated with a random rotamer from a high resolution side-chain library for protein residues,²⁸ and from a homogeneous library at 10 degree resolution for the ligand. If a structure with tolerable overlaps is generated in an attempt, it is minimized and sent to subsequent stages for judgment of acceptance. Each side-chain move takes less than 15 seconds and is the fastest among all the three move types.

Rigid body moves

Rigid body moves are used to sample the translational and rotational space of the ligand. Multiple attempts with reduced VDW radii are applied, as it is quite common to fail in searching for a clash-free conformation in a single rigid body moving attempt (especially when the ligand is large and flexible and the binding pocket is relatively small). Each rigid body move includes 1000 attempts, and each attempt performs a translation along a random vector and a rotation around a random axis, with less than 0.5 Å and 60 degree displacement, respectively. In addition, the VDW radii are reduced (scaling factor 0.8) to soften the Lennard-Jones potential, so that mild steric clashes are allowed, which are likely to be resolved by the subsequent minimization. The rigid body move usually takes 20 to 40 seconds per move.

Hybrid Monte Carlo moves

The hybrid Monte Carlo (HMC) move²⁹ in PLOP performs simultaneous sampling for the selected residues in the protein side chains and backbone as well as the ligand. Each HMC move performs a 5 picosecond, constant energy molecular dynamic simulation (starting at 900K) on all the atoms in the selected residues. The molecular dynamics simulation uses a RESPA based integration of short range forces with a time step of 1 femtosecond and updates long range forces with a Verlet integration every fifth step.³⁰ Taking up to 15 minutes per move, the HMC is the most expensive among all three types of moves in PLOP.

Considering the different costs for the three types of moves, the frequency of deployment of each move type in the various refinement stages is adjustable according to the sampling requirements. Two stages of refinement with different combinations of moves and constraints are carried out in the hierarchical sampling. Using more HMC moves, the first refinement stage applies loose distance constraints between an atom in question (from the ligand) to the ferryl oxygen. It is designed to “pull” the close atom (identified from the docking poses) towards the heme iron, to estimate the likelihood that the atom can approach the iron and react with the ferryl oxygen. When an atom in the ligand is forced to be proximate to the ferryl oxygen under the constraints, the rest of the ligand and the surrounding protein residues have to adjust their conformations accordingly. The adjustments for some poses are easy while for some others are difficult, depending upon the specific geometrical issues and energetics of the protein-ligand interactions for the trajectory connecting particular starting and target poses. Resulting poses with steric clashes or distorted structures can be identified by their high energies and discarded in the IDSite energy and structure screening. (Table 1) The low energy poses after screening, mostly with favorable interactions between the protein and the ligand, are passed to the second refinement stage. Mainly focusing on side-chain sampling, the second refinement stage applies tight constraints that force the structure to form special conformations similar to that of the transition states obtained from DFT calculations of model systems. The second refinement stage is used to further refine the poses and distinguish the potential of each atom in question to be oxidized. The comparison of the settings for these two refinement stages with PLOP are shown in Table 2, while the constraints are illustrated in Figures 3 and 4. There are approximately 39 protein residues, identified to be important for ligand binding by mutagenesis experiments³¹ or are adjacent to these key residues, that are sampled during the refinement stages. At the end of each refinement stage, all the poses sampled in that stage are screened and clustered for further refinement or evaluation. (Table 1)

Table 2.

Comparison of settings in the first and second refinement stages.

	PLOP Refinement 1	PLOP Refinement 2
Number of residues to sample (including the ligand)	12	40
Number of accepted structures for each job	Maximum of 8 times the number of rotatable bonds and 24	Maximum of 20 times the number of rotatable bonds and 60
Types and probabilities of MCM moves	Side chain: 0.50 Rigid body: 0.10 Hybrid: 0.40	Side chain: 0.70 Rigid body: 0.10 Hybrid: 0.20

Open in a new tab

For CYP2D6, harmonic constraints are also applied to force the basic nitrogen to interact with the acidic residues, Glu216 and Asp301 (Figure 5 and 6), as they are believed to play important roles in substrate binding to CYP2D6 from mutagenesis experiments.^26,32,33

Constraints applied to the salt bridge region of CYP2D6 in the *first* refinement stage.

Constraints applied to the salt bridge region of CYP2D6 in the *second* refinement stage.

3) Evaluation

Herein, we present two scoring models to evaluate the potential sites of metabolism and to determine the predictions. Our first scoring model (referred to as Physical IDSite) is based on the following assumptions: (1) For hydroxylation of an aliphatic chain carbon, the P450-hydrogen abstraction step is rate determining.^34,35 (2) For hydroxylation of aromatic rings, the electrophilic attack of Compound I on the aromatic ring is rate determining.^34,35(3) All reaction intermediates before the rate determining step are in equilibrium.³⁶ Given these assumptions, the relative rates of product formation depend only on the relative transition state free energies of the rate determining (RD) transition states (ΔG^≠) according to the Curtin-Hammett principle. These can then simply be written as

Δ G^{\neq} = Δ G_{bind} + Δ {G^{\neq}}_{RD - step}

(Eq. 1)

where ΔG_bind is the binding free energy of the substrate into the reactive conformation in the P450 active site and ΔG^≠_RD-step is the activation barrier of the RD-step.

In the present application of IDSite, we attempt to calculate only relative, as opposed to absolute, site reactivity for a given ligand. Absolute site reactivity for the ligand can typically be obtained via inexpensive experiments. However, detailed metabolic chemistry is often more difficult to determine, and an accurate three dimensional structure leading to reactions at each metabolic site is not available given the severe challenge of obtaining a crystal structure of a P450 isozyme with the ligand bound in the reactive conformation. Prediction of the most highly reactive site, followed by identification of all sites with relative reactivities sufficiently large to be experimentally detected along the dominant metabolic pathway, coupled to structural prediction for each relevant reactive geometry, complements current experimental practice and facilitates compound modification in situations where P450 metabolism needs to be altered to confer improved metabolic properties on a candidate drug molecule.

In the Physical IDSite model, the relative binding energies of various docked poses are calculated from the PLOP VSBG 2.0 energies of these poses, while the barriers for the RD-steps are estimated from the corresponding activation barriers of model compounds with a methoxy radical (calculated at the DFT level). E_pose in Eq. 3, calculated in PLOP, estimates the protein-ligand interactions when a potential site is forced to approach the catalytic center in a certain pose with a transition state-like conformation. Based on the linear correlation (Figure 7) between the methoxy radical activation barriers and the corresponding activation barriers with the heme system, we approximated the real activation barrier for each potential site of metabolism from the intrinsic reactivity calculated with the methoxy radical model according to Eq. 2.

Correlation between the intrinsic reactivities calculated with the methoxy radical model and the heme model (17 sites from selected 9 fragment compounds, details are shown in the supporting information).

IR (heme) = 1.117 \times IR (methoxy radical) + constant

(Eq. 2)

With the constant from Eq. 2 ignored, the relative ΔG^≠ for each potential site (approximated as the score E) is then calculated as the Boltzmann weighted average over the energies of all contributing poses, where angle brackets represent the Boltzmann averages (Eq. 3). A term describing the configurational entropy of equivalent hydrogen atoms at 298 K, proportional to the logarithm of the number of symmetrically equivalent hydrogen atoms, was also included. The ΔG^≠ values for all symmetrically equivalent sites were set to the lowest ΔG^≠ of the sites.

E = < 1.117 \times IR (methoxy radical) + E_{pose} > - {kTlnN}_{H}

(Eq. 3)

Since (as a rule of thumb) it is difficult to observe a minor metabolite experimentally if it is formed in less than ca. 0.1% yield (which corresponds to ca. 4.75 kcal/mol increase in relative ΔG^≠ compared to the free energy of the most favored product), we used 4.75 kcal/mol as a cutoff for the prediction; with Physical IDSite, any potential sites of metabolism having a relative ΔG^≠ lower than 4.75 kcal/mol is predicted to be a site of metabolism.

The second scoring model represents an empirically optimized version of the physical model described above with the following changes: (1) The PLOP energy (E_pose) is not used directly, but rescaled with two parameters as described below, which are fitted to a training set of 36 compounds. (2) Instead of obtaining the scaling coefficient for the methoxy radical intrinsic reactivities from the correlation in Figure 7, we fit it to the training set of 36 compounds. Note that the fitted value for the latter of 1.071 (Eq. 4) is very similar to the value obtained by correlating the DFT-activation energies (1.117), which further highlights the physical nature of this parameter. (3) The final selection criteria for predictions (score cutoff) were fit to the training set as well. All four fitted parameters were obtained from a fitting algorithm by maximizing the number of true positives over the sum of the numbers of false positives and false negatives.

E = < 1.071 \times IR (methoxy radical) + E_{score} > - {kTlnN}_{H}

(Eq. 4)

As introduced above, instead of directly using the PLOP energy (E_pose), Eq. 4 recalculates the binding contribution (E_score) with a linear energy score; the angle brackets again represent Boltzmann averages. If a pose has a PLOP energy (E_pose) within 5.26 kcal/mol from the lowest one, the energy score (E_score) is zero; otherwise, it is 0.58 times the relative energy. The potential sites that have relative score within 1.46 kcal/mol of a site predicted to have the highest reactivity are considered to be a site of metabolism.

Reactivity model

The sites at which a ligand gets metabolized by a P450 enzyme depends not only on whether the atom in question can approach the heme iron center with the correct geometry, but also on the intrinsic chemical reactivity of the site. Assuming that the intrinsic chemical reactivities of the ligand sites are independent of the presence of the enzyme, we estimated the intrinsic reactivities from activation energies of a library of model systems using QM. Since DFT with the B3LYP functional and the 6–31G* basis set has been shown to give high accuracy for relative energies of transitions states,³⁷ while still allowing for fast calculations, we employed that level of theory for our intrinsic reactivity model. It has been shown that in general, an accurate linear correlation exists between the QM activation energies of hydrogen abstraction reactions with a methoxy radical and the corresponding hydrogen abstraction barriers with an iron-oxo porphyrin species, generally referred to as Compound I in the P450 literatures.³⁴ In agreement with previous reports,^38,39 we herein investigated the above-mentioned correlation including aliphatic hydrogen abstraction barriers as well as aromatic ones. As shown in Figure 7, we find a good correlation between the methoxy radical and the Compound I based activation barriers for both sp² and sp³ hybridized systems (R²=0.94), which validates the use of the methoxy radical model to estimate the intrinsic reactivities. Therefore, transition states for methoxy radical based hydrogen abstraction reactions were optimized at the B3LYP/LACVP* level of theory with Jaguar²⁷ for a fragment library consisting of 150 model compounds, 483 distinct hydrogen atoms, and more than 2000 conformations, in order to accurately model all distinct chemical environments. Carbon atom based intrinsic reactivities were then assigned as the Boltzmann weighted activation energies over different transition state conformations. Intrinsic reactivities of the ligand sites were assigned using a simple SMARTS string matching algorithm of the fragment library. Thereby the best matching fragment was determined as the one with (1) the largest number of heavy atoms (2) the most hydrogen atoms and (3) the largest sum of atomic numbers.

Preparation of protein and ligands

The X-ray crystallographic structure of CYP2D6 was obtained from the Protein Data Bank (PDBID: 2F9Q, 3.0 Å resolution) and contains a well-defined active site above the heme group.⁴⁰ We applied the Protein Preparation Wizard (PPW) of Schrödinger Inc. to add hydrogen atoms, optimize the hydroxyl orientation, correct the Gln/Asn/His side-chain orientations, and determine the protonation states of titratable residues. PPW also assigned the bond order of the heme group and the iron oxidation state, which defines the iron atom as Fe³⁺ covalently bonded to the side chain of Cys443. The positions of all hydrogen atoms were optimized with a constraint of 0.3 Å with the OPLS 2005 force field.

A training set of 36 compounds and a test set of 20 compounds were collected from the experimental literature.^31,41 These compounds mainly undergo O-dealkylation and hydroxylation by CYP2D6. The training and test sets contain 774 and 383 heavy atoms, respectively. Details about the data selection are explained in the Supporting Information. All stereoisomers used in the experiments were enumerated as were the protonation states at pH=7.0. All structures were minimized in vacuum using the OPLS 2005 force field, prior to the IDSite calculations.

Results and Discussion

Tables 3 and 4 present the summary of our predicted results with the training set and the test set. The data shows that IDSite has high sensitivity and specificity with both IDSite scoring models in predicting the 2D6-mediated metabolism of the 56 compounds; using the Physical IDSite scoring, we achieve high sensitivity (0.83) and high specificity (0.98); using the Fitted IDSite scoring, we can achieve even higher sensitivity (0.94) and similarly high specificity (0.99). With the Fitted IDSite scoring, the results for the training set (sensitivity 0.91 and specificity 0.99) and test set (sensitivity 1.0 and specificity 0.98) are very similar, indicating that for the fitted model, no overfitting to the training set can be detected.

Table 3.

Summary of results for the training set.

Symbol	Compound Name	Physical IDSite			Fitted IDSite
Symbol	Compound Name	TP	FP	FN	TP	FP	FN
1	4-methoxyamphetamine	1	0	0	1	0	0
2	Amitriptyline	2	2	0	2	0	0
3	Aprindine	4	0	1	5	0	0
4	Brofaromine	1	0	0	1	0	0
5	Bufuralol	0	1	1	1	0	0
6	Carvedilol	1	0	2	2	0	1
7	Cinnarizine	0	2	1	0	2	1
8	Clomipramine	1	0	1	1	0	1
9	Codeine	1	0	0	1	0	0
10	Desipramine	2	0	0	2	0	0
11	Dextromethorphan	1	0	0	1	0	0
12	Dihydrocodeine	1	1	0	1	0	0
13	Ethylmorphine	1	0	0	1	0	0
14	Flunarizine	1	0	0	1	0	0
15	Fluperlapine	1	0	0	1	0	0
16	Hydrocodone	1	0	0	1	0	0
17	Imipramine	2	0	0	2	0	0
18	Indoramine	1	0	0	1	0	0
19	MDMA	1	0	0	1	0	0
20	Methamphetamine	1	0	0	1	2	0
21	Methoxyphenamine	2	0	0	2	0	0
22	Metoprolol	1	0	1	2	0	0
23	Mexiletine	2	0	1	2	0	1
24	Mianserin	1	0	0	1	0	0
25	Mirtazapine	0	1	1	1	1	0
26	Nortriptyline	1	1	0	1	0	0
27	Ondansetron	2	0	0	1	0	1
28	Paroxetine	1	0	0	1	0	0
29	Perhexiline	2	0	0	2	0	0
30	Propafenone	1	1	0	1	1	0
31	Propranolol	2	2	0	2	1	0
32	Tamoxifen	1	0	0	1	0	0
33	Terfenadine	3	0	0	3	0	0
34	Tiracizine	1	2	0	1	1	0
35	Tropisetron	2	0	1	3	0	0
36	Venlafaxine	1	0	0	1	0	0

	TOTAL	47	13	10	52	8	5

Open in a new tab

Table 4.

Result summary for the test set.

Symbol	Compound Name	Physical IDSite			Fitted IDSite
Symbol	Compound Name	TP	FP	FN	TP	FP	FN
37	Atomoxetine	0	1	1	1	2	0
38	Bicifadine	1	2	0	1	0	0
39	Bupranolol	1	0	0	1	0	0
40	Carteolol	1	1	0	1	0	0
41	Chlorpromazine	1	0	0	1	0	0
42	EMAMC	1	0	0	1	0	0
43	Encainide	1	1	0	1	1	0
44	Harmaline	1	0	0	1	0	0
45	Harmine	1	1	0	1	1	0
46	Ibogaine	1	0	0	1	0	0
47	MAMC	1	0	0	1	0	0
48	MMAMC	1	0	0	1	0	0
49	MOPPP	1	0	0	1	0	0
50	Oxycodone	1	0	0	1	0	0
51	Spirosulfonamide	2	0	0	2	0	0
52	Timolol	2	0	2	4	0	0
53	Tolterodine	0	1	1	1	1	0
54	Tramadol	1	1	0	1	1	0
55	Tyramine	2	0	0	2	0	0
56	Zotepine	1	0	0	1	0	0

	TOTAL	21	8	4	25	6	0

Open in a new tab

It is interesting to note that the principal effect of the parameter fitting is to reduce the number of false negatives; the reduction is of similar magnitude in both the training and test sets (there is also some reduction of false positives in the training set, but this is a less prominent result). The principal effect of the parameterization is to take into account the fact that there is some noise in the induced fit calculation energetics, reflected in the 5.26 kcal/mole energy window and scaling factor of 0.58. The noise is a combination of imperfect sampling and residual errors in the continuum solvent free energy model; the parameters suggest that there is a slight overestimation of the relative energetics of poses close in energy. Buffering and scaling the contribution from this term enables a (small) number of secondary sites to be recognized by the model as contributing to the reactivity, without increasing the number of false positives. As noted above, the intrinsic reactivity appears to have less noise associated with it, which is not surprising in view of the fact that it poses a much less demanding sampling challenge.

A second question of interest is whether the various intensive sampling components of the algorithm actually improve the predictive capability. In order to analyze the importance of each sampling stage in IDSite, ROC (Receiver Operating Characteristic) curves were calculated (Figure 10A) to compare three reduced methods using the fitted score to the full method using physical and fitted scores. As mentioned in the Methods section, each refinement stage performs a constrained minimization, followed by sampling with MCM simulations. After Glide docking, the prediction can be made after minimization in the first refinement stage (referred to as “docking+minimization”), after the sampling in the first refinement stage (referred to as “no Ref2”), or after the minimization in the second refinement stage (referred to as “no sampling in Ref2”). Higher energy cutoff (150 kcal/mol, instead of 24 kcal/mol) and distance cutoff (8.0 Å, instead of 2.6 Å for sp³ and 2.08 Å for sp² hybridized atoms) are adjusted for the methods of “docking+minimization” and “no Ref2”. To draw the ROC curves, the scoring cutoff (4.75 and 1.46 kcal/mol are used for the results shown in Table 3 and 4 for the physical and fitted scores, respectively) is varied at 0.5 kcal/mol interval from 0.0 to 100 kcal/mol, which represent the true positive rate (y-axis) and the corresponding false positive rate (x-axis) of the methods. True positive rate and false positive rate are calculated according to Eq. 5,

(A) ROC curves comparing the full IDSite method to the reduced methods. (B) ROC curves superimposed on the results of Sheridan *et al*.⁸

\begin{array}{l} True positive rate = number of true positives / number of SOMs observed in experiments \\ False positive rate = number of false positives / number of non - SOMs observed in experiments \end{array}

(Eq. 5)

where true positives are the SOMs (sp² and sp³ carbon atoms which undergo hydroxylation or O-dealkylation) identified by experiments as well as predicted correctly by IDSite, and the false positives are non-SOMs (nonhydrogen atoms) but mispredicted by IDSite as hydroxylated/dealkylated by CYP2D6. As currently we mainly focus on the typical CYP2D6-mediated hydroxylation and O-dealkylation involving sp² and sp³ carbon atoms with bonded hydrogen atoms, those sites (carbon atoms or heteoatoms) which potentially undergo other metabolic reactions such as N-dealkylation and oxidation are currently considered as non-SOMs in our preliminary study.

The ROC curves in Figure 10A indicates that at the same false positive rate (sensitivity), the false positive rate decreases with more sampling and the full IDSite method always has the lowest false positive rate (the highest specificity) with both scoring models. It is interesting that the physical score derived from the basic physical chemistry model is very close to the fitted score. For the reduced methods, there is an obvious trend that increasing the sampling efforts yields substantially higher specificity at each stage. This means that using the IDSite scoring models in conjunction with binding requirements, sufficient sampling in IDSite can specifically identify the sites metabolism observed by the experiments.

In Figure 10B we compare the Physical IDSite and Fitted IDSite results to results from Sheridan et al.,⁸ who evaluated true positive and false positive rates, using the same ROC metric that we employ, for their test set of CYP2D6 ligands. The test set employed in ref. 8 is different in detail from the one we use here, but the types of ligands in both test sets are similar based on examples of test set molecules given in ref.8. Hence, while the comparison is not completely rigorous, it is a reasonable way to estimate relative performance. It can be seen that given the caveat above, both Physical IDSite and Fitted IDSite substantially outperform both MetaSite and the in house Merck QSAR-based approached plotted in Figure 10B. To recover 90% of true positives, the QSAR method included roughly 20% of false positives, whereas MetaSite included 40% of false positives. In contrast, IDSite incorporated only ~1% of false positives. This is a qualitative transformation of performance that has significant implications for use in drug discovery applications, as does the availability of a predicted three dimensional structure that is likely to be quite accurate.

So far, only the apo enzyme structure of CYP2D6 has been determined by X-ray crystallography. In order to investigate the capability of IDSite in modeling the induced-fit effects and understand the effects of the hierarchical sampling, several compounds of various sizes and flexibility were selected to analyze the structural and energetic changes at each stage.

It is very common that the poses from docking that have the SOM close to the ferryl oxygen are not among the top poses considered by Glide SP scoring. For example, the pose with the shortest distance (1.8 Å) is ranked 6^th in the case of 4-methoxyaphetamine; the pose (1.4 Å) that leads to prediction of O-demethylation is ranked 20^th for the case of metoprolol. Further, it is also possible for some cases (e.g. fluperlapine) that none of the poses have the SOM close enough to the ferryl oxygen. Therefore, it is very difficult to make specific predictions with only a small distance cutoff and a few top poses from docking. In order to improve the sensitivity as well as the specificity of the predictions, it appears to be necessary to employ the refinement stages.

Focusing on the distance between the site(s) of metabolism observed experimentally, we investigated the Boltzmann averaged energy and distance from the site(s) to the ferryl oxygen over all the poses sampled at any even numbered step. Given the strong harmonic constraints applied in the refinement stages, the distance change is generally relatively small as expected. The energy change in the first refinement is usually small ranging from 4 to 25 kcal/mol. However, the energy change during the second refinement stage is quite different for small ligands as compared to large ligands. For ligands as small as 4-methoxyaphetamine, the energy of the poses fluctuated within the range of 12 kcal/mol and the lowest energy structure was obtained at the early steps. In contrast, the energy can decrease by more than 60 kcal/mol during the sampling of the second refinement stage for flexible or bulky ligands such as fluperlapine. For such cases it is often not until the end of the simulation that the low energy structure is sampled. This implies that the second refinement plays an important role in optimizing the structure for bulky or flexible compounds.

Skipping the second refinement, about 40% of the compounds (24/56) in the training set have the same results as obtained from the full protocol and most of them are small compounds like 4-methoxyaphetamine, MDMA, MAMC, etc. This observation is consistent with our discussion above that links the need for extended refinement to the presence of large, bulky ligands where protein induced-fit effects are significant, and where optimization of the free energy of the reactive binding complex can pose great difficulties due to various types of energy barriers and additional degrees of freedom to explore in the ligand.

Analysis of Induced-fit Effects

P450 enzymes are believed to have high flexibility in adjusting their active site to accommodate a large variety of substrates.⁴² In order to model such induced fit effects, sufficient sampling provided by the two refinement stages of IDSite is critical as demonstrated in the previous section. In order to further investigate the capability of IDSite in modeling induced-fit effects, we calculated the average absolute change for each dihedral angle of the protein side chains in the binding box in comparison to the minimized crystal structure of CYP2D6. The largest change of all the chi angles for each residue is used to represent the change for that residue. Figure 14 illustrates the induced-fit effects by showing the largest change for each residue. 10 of the 18 residues in the binding box have changes greater than 30°. This shows that IDSite is able to model induced-fit effects required to correctly identify the “bio-active” conformation of the ligands by changing the side-chain orientations in the active site. Phe120 and Phe483 with bulky side chains have changes as large as 40° and 60°, respectively. However, the magnitude of their induced fit effects depends highly on the ligand size. Between these two Phe residues in the binding box, Met374 has the most significant change (108°) because a small rotation in the Phe side chains can cause a big adjustment in Met374. Compared to the large change of Glu216 (88°), the change of Aps301 (38°) is relatively smaller due to the shorter side chain.

Illustration of the induced-fit effects modeled by IDSite. Cyan-white-red scheme is used to show the side chains from the least changed to the most changed, defined as the maximum mean absolute dihedral angle change for each residue.

The above mentioned trends are illustrated in figures 15–17, which compare the docked structures leading to the SOM of 4-methoxyamphetamine (PMA), fluperlapine and metoprolol to the crystal structure of the apo-enzyme minimized with the VSGB 2.0 energy model. Analogous figures can be found for all our predictions in the supplementary information. One striking example of induced-fit effects involves Phe120. For small ligands such as PMA the benzene ring conformation of Phe120 changes only slightly (Figure 15) while it has to move out of the way for larger ligands such as fluperlapine (Figure 16) or metoprolol (Figure 17), therefore rotating by almost 90°. Interestingly, for compounds with multiple sites of metabolism, such as metoprolol (Figure 17), different binding modes leading to different SOMs have very different conformations of the Phe120 side chain as well. Our IDSite docked structures clearly highlight the importance of induced fit effects for CYP2D6 metabolism and therefore explain why it is difficult to accurately predict SOMs with a rigid receptor model.

**(A)** The lowest energy pose in the second refinement stage for 4-methoxyaphetamine. Orange sphere = “dummy” ferryl oxygen, green sphere = experimental and predicted SOM. **(B)** Comparison of side chains important for induced-fit effects. Crystal structure (green, PDBID: 2F9Q) minimized with the VSGB 2.0 model and superimposed onto the lowest energy pose with 4-methoxyamphetamine (salmon). Large dihedral changes are seen for Asp301 (Δchi2, 121°), Met374 (Δchi3, 114°), and Phe483 (Δchi1, 60°).

**(A)** The lowest energy poses in the second refinement stage for metoprolol benzylic hydroxylation. **(B)** Comparison of side chains important for induced fit effects for metoprolol benzylic hydroxylation. **(C)** The lowest energy poses in the second refinement stage for metoprolol O-dealkylation. **(D)** Comparison of side chains important for induced fit effects for metoprolol O-dealkylation. For **(A)**, **(C)** orange spheres = “dummy” ferryl oxygen, green spheres = experimental and predicted SOMs. For **(B)**, **(D)** crystal structure (green, PDBID: 2F9Q) minimized with the VSGB 2.0 model and superimposed onto the lowest energy poses with metoprolol (salmon). For benzylic hydroxylation, large dihedral changes are seen for Glu216 (Δchi1, 60°), Asp301 (Δchi2, 66°), Met374 (Δchi3, 112°), and Phe483 (Δchi1, 40°); for O-dealkylation, large dihedral changes are seen for Phe120 (Δchi2, 67°), Glu216 (Δchi2, 50°), and Phe483 (Δchi2, 194°)

**(A)** The lowest energy pose in the second refinement stage for fluperlapine. Orange sphere = “dummy” ferryl oxygen, green sphere = experimental and predicted SOM. **(B)** Comparison of side chains important for induced fit effects. Crystal structure (green, PDBID: 2F9Q) minimized with the VSGB 2.0 model and superimposed onto the lowest energy pose with Fluperlapine (salmon). Large dihedral changes are seen for Phe120 (Δchi2, 73°), Glu216 (Δchi1, 60°), Asp301 (Δchi2, 64°), Met374 (Δchi3, 105°), and Phe483 (Δchi2, 94°).

The importance of structural effects in determining SOMs

The two main competing factors in determining the SOMs with P450 enzymes are the intrinsic reactivities of the ligand sites and the geometric fit of the ligand in the active site. As mentioned in the methods section, IDSite considers both of these effects in determining the SOMs, which enables it to select the correct SOM even for difficult cases, where the intrinsic reactivity favors the non-site of metabolism. For these cases, the structural fit of the ligand with the receptor, i.e. how easily the ligand site can reach the ferryl oxygen, mainly determines the SOM. Therefore, the structures and energies of the poses, with consideration of the receptor, have to be utilized. Three cases are used here to demonstrate the role of a receptor (CYP2D6) in determining the sites of metabolism.

The first case we discuss is brofaromine, for which experiments show that the major metabolic pathway is O-demethylation mediated by CYP2D6.⁴³. The intrinsic reactivity of the site of metabolism (4.7 kcal/mol) is very close to those of sites on the aromatic rings (non sites of metabolism, 3.3–4.9 kcal/mol) (Figure 18). Due to the receptor geometry, it is impossible for the atoms on the furan ring to get close to the ferryl oxygen while still attaining the salt bridge with either Glu216 or Asp301. Therefore, no qualified poses were found leading to a reaction on the furan ring. Although we found qualified poses for all the sites on the benzene ring, those poses are all strongly disfavored energetically by more than 20 kcal/mol. This indicates that taking the interactions between the ligand and the receptor into account, IDSite is able to make the prediction of the SOM for brofaromine in good agreement with the experimental observation.

**(A)** The lowest energy pose in the second refinement stage for brofaromine. Orange sphere = “dummy” ferryl oxygen, green sphere = experimental and predicted SOM. **(B)** Intrinsic reactivities (red) for each site and the relative energy (blue) of the poses with the corresponding site constrained to the ferryl oxygen. The SOM observed experimentally is marked with a green circle.

A second interesting case is nortriptyline, since the two sites on the 7-membered aliphatic ring are difficult to distinguish only with their intrinsic reactivity as they are almost equally reactive. However, experiments show that only the (E)-10 site of nortriptyline is metabolized.⁴⁴ In IDSite, the poses with the (Z)-10 site close to the ferryl oxygen are all at least 10 kcal/mol higher in energy compared to the poses with the (E)-10 atom close to the ferryl oxygen. Such an energy gap is large enough for IDSite to correctly determine the (E)–isomer as the only metabolite. While structural effects are therefore clearly very important to determine nortriptylene’s SOM, the intrinsic reactivities also play a key role. This is again nicely illustrated with the example of nortriptyline, where a simply structure based method (without considering intrinsic reactivities) would predict the SOM as being an aromatic hydroxylation due to the favorable energy of the corresponding poses. Therefore, IDSite is able to correctly balance the subtle effects stemming from intrinsic reactivity and structural fit.

Methoxyphenamine is another case where the joint effects of intrinsic reactivity and the structural fit lead to the correct predictions. Methoxyphetamine is metabolized through O-demethylation and aromatic hydroxylation mediated by CYP2D6.⁴⁵ These two sites not only have very close intrinsic reactivities (5.7 and 6.3 kcal/mol, Figure 20), but their lowest energy poses also have very similar energies. The non SOMs are not selected by IDSite, either because of disfavorable intrinsic reactivity or because of high pose energies.

The lowest energy pose in the second refinement stage for methoxyphenamine. Orange sphere = “dummy” ferryl oxygen, green sphere = experimental and predicted SOM. **(A)** Aromatic hydroxylation. **(B)** O-demethylation. **(C)** Intrinsic reactivities (red) for each site and the relative energy (blue) of the poses with the corresponding site constrained to the ferryl oxygen. The SOM observed experimentally is marked with a green circle.

Computational Times

On a single 2.2 GHz AMD Opteron Processor 6174, the average CPU time required for a typical IDSite calculation (e.g. with a compound with 3 rotatable bonds) is about 448 hours, of which about 11% of the time is spent on the first refinement stage and 89% on the second refinement. On 20 such processors the calculation takes 22 hours. The initial Glide docking step on a single processor takes about 10 min. The computational cost of PLOP refinement is proportional to the number of rotatable bonds in the compound.

Conclusion

We have developed a novel approach for the prediction of experimentally observable cytochrome P450 sites of metabolism, IDSite, and applied it to a data set for the 2D6 P450 isoform. We obtain remarkably high sensitivity and specificity using a structure-based model, representing a major advance as compared to alternatives in the literature, including various types of ligand-based models. The method delivers not only accurate SOM predictions, but also three dimensional structures of the protein-ligand complex, including induced fit effects (which are quite significant), for every SOM identified by the algorithm.

We selected 2D6 as our initial target because the binding of a ligand positive nitrogen to an acidic group in the protein created an additional constraint that was useful in limiting sampling and achieving reliable poses in the induced fit docking effort. Other important P450 isoforms, such as 1A2, 2C9 and 3A4, may be more difficult to model in this fashion as they lack such a salt bridge constraint; nevertheless, even if additional sampling effort is required, it should be possible to obtain successful results given the performance of the conformational energy and reactivity models that we have seen in the present work. The development of models for additional isoforms, and to additional ligand test sets, is ongoing in our laboratory. Ultimately, predictive use in an active drug discovery project will be required for validation; we look forward to engaging in such tests in the near future.

Supplementary Material

1_si_001

NIHMS324948-supplement-1_si_001.pdf^{(1.4MB, pdf)}

IDSite predicted results for the training set.

IDSite predicted results for the test set.

The energy and distance (constrained atom to the ferryl oxygen) changes during the MCM simulation during the first **(A)** and the second **(B)** refinement stages for 4-methoxyaphetamine.

**(A)** The lowest energy pose in the second refinement stage for nortriptyline. Orange sphere = “dummy” ferryl oxygen, green sphere = experimental and predicted SOM. **(B)** Intrinsic reactivities (red) for each site and the relative energy (blue) of the poses with the corresponding site constrained to the ferryl oxygen. The SOM observed experimentally is marked with a green circle.

Acknowledgments

This work was supported by the NIH grant GM-40526 to RAF. STS also thanks the Guthikonda Family for an Arun Guthikonda Memorial Graduate Fellowship. We thank Professor Ronald Breslow, Dr. Tyler Day, Dr. Robert Abel, Dr. Zhiyong Zhou, Michelle L. Hall and Jing Zhang for the helpful discussions. RAF has a significant financial STAKE in Schrödinger, Inc., is a consultant to Schrödinger, Inc., and is on the Scientific Advisory Board of Schrödinger, Inc.

Footnotes

Supporting Information available.

Details about our dataset selection, figures of all docked structures leading to our predictions, tables with the activation barriers used to draw figure 7 and tables with detailed dihedral angle changes due to induced-fit effects. This information is available free of charge via the Internet at http://pubs.acs.org.

References

1.Bailey DG, Malcolm J, Arnold O, Spence JD. Br J Clin Pharmacol. 1998;46:101. doi: 10.1046/j.1365-2125.1998.00764.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Preskorn SH. Clin Pharmacokinet. 1997;32:1. doi: 10.2165/00003088-199700321-00003. [DOI] [PubMed] [Google Scholar]
3.Dresser GK, Spence JD, Bailey DG. Clin Pharmacokinet. 2000;38:41. doi: 10.2165/00003088-200038010-00003. [DOI] [PubMed] [Google Scholar]
4.Afzelius L, Arnby CH, Broo A, Carlsson L, Isaksson C, Jurva U, Kjellander B, Kolmodin K, Nilsson K, Raubacher F, Weidolf L. Drug Metab Rev. 2007;39:61. doi: 10.1080/03602530600969374. [DOI] [PubMed] [Google Scholar]
5.Singh SB, Shen LQ, Walker MJ, Sheridan RP. J Med Chem. 2003;46:1330. doi: 10.1021/jm020400s. [DOI] [PubMed] [Google Scholar]
6.Rydberg P, Gloriam DE, Zaretzki J, Breneman C, Olsen L. ACS Medicinal Chemistry Letters. 2010;1:96. doi: 10.1021/ml100016x. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.de Groot MJ, Alex AA, Jones BC. J Med Chem. 2002;45:1983. doi: 10.1021/jm0110791. [DOI] [PubMed] [Google Scholar]
8.Sheridan RP, Korzekwa KR, Torres RA, Walker MJ. J Med Chem. 2007;50:3173. doi: 10.1021/jm0613471. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Zaretzki J, Bergeron C, Rydberg P, Huang T-w, Bennett KP, Breneman CM. J Chem Inf Model. 2011;51:1667. doi: 10.1021/ci2000488. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Cruciani G, Carosati E, De Boeck B, Ethirajulu K, Mackie C, Howe T, Vianello R. J Med Chem. 2005;48:6970. doi: 10.1021/jm050529c. [DOI] [PubMed] [Google Scholar]
11.Jones JP, Korzekwa KR. In: Methods Enzymol. Eric FJ, Michael RW, editors. Vol. 272. Academic Press; New York, NY: 1996. p. 326. [DOI] [PubMed] [Google Scholar]
12.Oláh J, Mulholland AJ, Harvey JN. Proc Natl Acad Sci U S A. 2011;108:6050. doi: 10.1073/pnas.1010194108. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kirton SB, Kemp CA, Tomkinson NP, St-Gallay S, Sutcliffe MJ. Proteins: Struct, Funct, Bioinf. 2002;49:216. doi: 10.1002/prot.10192. [DOI] [PubMed] [Google Scholar]
14.de Graaf C, Oostenbrink C, Keizers PHJ, van der Wijst T, Jongejan A, Vermeulen NPE. J Med Chem. 2006;49:2417. doi: 10.1021/jm0508538. [DOI] [PubMed] [Google Scholar]
15.Unwalla R, Cross J, Salaniwal S, Shilling A, Leung L, Kao J, Humblet C. J Comput-Aided Mol Des. 2010;24:237. doi: 10.1007/s10822-010-9336-6. [DOI] [PubMed] [Google Scholar]
16.Vasanthanathan P, Hritz J, Taboureau O, Olsen L, Jorgensen FS, Vermeulen NPE, Oostenbrink C. J Chem Inf Model. 2009;49:43. doi: 10.1021/ci800371f. [DOI] [PubMed] [Google Scholar]
17.Rydberg P, Hansen SM, Kongsted J, Norrby PO, Olsen L, Ryde U. J Chem Theory Comput. 2008;4:673. doi: 10.1021/ct700313j. [DOI] [PubMed] [Google Scholar]
18.Gleeson MP, Davis AM, Chohan KK, Paine SW, Boyer S, Gavaghan CL, Arnby CH, Kankkonen C, Albertson N. J Comput-Aided Mol Des. 2007;21:559. doi: 10.1007/s10822-007-9139-6. [DOI] [PubMed] [Google Scholar]
19.Li J, Abel R, Zhu K, Cao Y, Friesner R. Proteins: Struct, Funct, Bioinf. 2011 doi: 10.1002/prot.23106. [DOI]
20.Bathelt CM, Mulholland AJ, Harvey JN. J Phys Chem A. 2008;112:13149. doi: 10.1021/jp8016908. [DOI] [PubMed] [Google Scholar]
21.Tian L, Friesner RA. J Chem Theory Comput. 2009;5:1421. doi: 10.1021/ct900040n. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Glide, version 5.6. Schrödinger, Inc; New York, NY: 2010. [Google Scholar]
23.Prime, version 3.0. Schrödinger, Inc; New York, NY: 2011. [Google Scholar]
24.Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, Banks JL. J Med Chem. 2004;47:1750. doi: 10.1021/jm030644s. [DOI] [PubMed] [Google Scholar]
25.Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS. J Med Chem. 2004;47:1739. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]
26.Paine MJI, McLaughlin LA, Flanagan JU, Kemp CA, Sutcliffe MJ, Roberts GCK, Wolf CR. J Biol Chem. 2003;278:4021. doi: 10.1074/jbc.M209519200. [DOI] [PubMed] [Google Scholar]
27.Jaguar, version 7.6. Schrödinger, Inc; New York, NY: 2010. [Google Scholar]
28.Xiang ZX, Honig B. J Mol Biol. 2001;311:421. doi: 10.1006/jmbi.2001.4865. [DOI] [PubMed] [Google Scholar]
29.Duane S, Kennedy AD, Pendleton BJ, Roweth D. Phys Lett B. 1987;195:216. [Google Scholar]
30.Tuckerman M, Berne BJ, Martyna GJ. J Chem Phys. 1992;97:1990. [Google Scholar]
31.Wang B, Yang LP, Zhang XZ, Huang SQ, Bartlam M, Zhou SF. Drug Metab Rev. 2009;41:573. doi: 10.1080/03602530903118729. [DOI] [PubMed] [Google Scholar]
32.Guengerich FP, Miller GP, Hanna IH, Martin MV, Leger S, Black C, Chauret N, Silva JM, Trimble LA, Yergey JA, Nicoll-Griffith DA. Biochemistry. 2002;41:11025. doi: 10.1021/bi020341k. [DOI] [PubMed] [Google Scholar]
33.Guengerich FP, Hanna IH, Martin MV, Gillam EMJ. Biochemistry. 2003;42:1245. doi: 10.1021/bi027085w. [DOI] [PubMed] [Google Scholar]
34.Guengerich FP. Chem Res Toxicol. 2001;14:611. doi: 10.1021/tx0002583. [DOI] [PubMed] [Google Scholar]
35.Shaik S, Kumar D, de Visser SP, Altun A, Thiel W. Chem Rev. 2005;105:2279. doi: 10.1021/cr030722j. [DOI] [PubMed] [Google Scholar]
36.Wang YH, Li Y, Wang B. J Phys Chem B. 2007;111:4251. doi: 10.1021/jp071222n. [DOI] [PubMed] [Google Scholar]
37.Schneebeli ST, Hall ML, Breslow R, Friesner R. J Am Chem Soc. 2009;131:3965. doi: 10.1021/ja806951r. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Olsen L, Rydberg P, Rod TH, Ryde U. J Med Chem. 2006;49:6489. doi: 10.1021/jm060551l. [DOI] [PubMed] [Google Scholar]
39.Rydberg P, Ryde U, Olsen L. J Phys Chem A. 2008;112:13058. doi: 10.1021/jp803854v. [DOI] [PubMed] [Google Scholar]
40.Rowland P, Blaney FE, Smyth MG, Jones JJ, Leydon VR, Oxbrow AK, Lewis CJ, Tennant MG, Modi S, Eggleston DS, Chenery RJ, Bridges AM. J Biol Chem. 2006;281:7614. doi: 10.1074/jbc.M511232200. [DOI] [PubMed] [Google Scholar]
41.de Groot MJ, Ackland MJ, Horne VA, Alex AA, Jones BC. J Med Chem. 1999;42:1515. doi: 10.1021/jm981118h. [DOI] [PubMed] [Google Scholar]
42.Lill MA, Dobler M, Vedani A. ChemMedChem. 2006;1:73. doi: 10.1002/cmdc.200500024. [DOI] [PubMed] [Google Scholar]
43.Feifel N, Kucher K, Fuchs L, Jedrychowski M, Schmidt E, Antonin KH, Bieck PR, Gleiter CH. Eur J Clin Pharmacol. 1993;45:265. doi: 10.1007/BF00315394. [DOI] [PubMed] [Google Scholar]
44.Olesen OV, Linnet K. Drug Metab Dispos. 1997;25:740. [PubMed] [Google Scholar]
45.Geertsen S, Foster BC, Wilson DL, Cyr TD, Casley W. Xenobiotica. 1995;25:895. doi: 10.3109/00498259509046661. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

NIHMS324948-supplement-1_si_001.pdf^{(1.4MB, pdf)}

[R1] 1.Bailey DG, Malcolm J, Arnold O, Spence JD. Br J Clin Pharmacol. 1998;46:101. doi: 10.1046/j.1365-2125.1998.00764.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Preskorn SH. Clin Pharmacokinet. 1997;32:1. doi: 10.2165/00003088-199700321-00003. [DOI] [PubMed] [Google Scholar]

[R3] 3.Dresser GK, Spence JD, Bailey DG. Clin Pharmacokinet. 2000;38:41. doi: 10.2165/00003088-200038010-00003. [DOI] [PubMed] [Google Scholar]

[R4] 4.Afzelius L, Arnby CH, Broo A, Carlsson L, Isaksson C, Jurva U, Kjellander B, Kolmodin K, Nilsson K, Raubacher F, Weidolf L. Drug Metab Rev. 2007;39:61. doi: 10.1080/03602530600969374. [DOI] [PubMed] [Google Scholar]

[R5] 5.Singh SB, Shen LQ, Walker MJ, Sheridan RP. J Med Chem. 2003;46:1330. doi: 10.1021/jm020400s. [DOI] [PubMed] [Google Scholar]

[R6] 6.Rydberg P, Gloriam DE, Zaretzki J, Breneman C, Olsen L. ACS Medicinal Chemistry Letters. 2010;1:96. doi: 10.1021/ml100016x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.de Groot MJ, Alex AA, Jones BC. J Med Chem. 2002;45:1983. doi: 10.1021/jm0110791. [DOI] [PubMed] [Google Scholar]

[R8] 8.Sheridan RP, Korzekwa KR, Torres RA, Walker MJ. J Med Chem. 2007;50:3173. doi: 10.1021/jm0613471. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Zaretzki J, Bergeron C, Rydberg P, Huang T-w, Bennett KP, Breneman CM. J Chem Inf Model. 2011;51:1667. doi: 10.1021/ci2000488. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Cruciani G, Carosati E, De Boeck B, Ethirajulu K, Mackie C, Howe T, Vianello R. J Med Chem. 2005;48:6970. doi: 10.1021/jm050529c. [DOI] [PubMed] [Google Scholar]

[R11] 11.Jones JP, Korzekwa KR. In: Methods Enzymol. Eric FJ, Michael RW, editors. Vol. 272. Academic Press; New York, NY: 1996. p. 326. [DOI] [PubMed] [Google Scholar]

[R12] 12.Oláh J, Mulholland AJ, Harvey JN. Proc Natl Acad Sci U S A. 2011;108:6050. doi: 10.1073/pnas.1010194108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Kirton SB, Kemp CA, Tomkinson NP, St-Gallay S, Sutcliffe MJ. Proteins: Struct, Funct, Bioinf. 2002;49:216. doi: 10.1002/prot.10192. [DOI] [PubMed] [Google Scholar]

[R14] 14.de Graaf C, Oostenbrink C, Keizers PHJ, van der Wijst T, Jongejan A, Vermeulen NPE. J Med Chem. 2006;49:2417. doi: 10.1021/jm0508538. [DOI] [PubMed] [Google Scholar]

[R15] 15.Unwalla R, Cross J, Salaniwal S, Shilling A, Leung L, Kao J, Humblet C. J Comput-Aided Mol Des. 2010;24:237. doi: 10.1007/s10822-010-9336-6. [DOI] [PubMed] [Google Scholar]

[R16] 16.Vasanthanathan P, Hritz J, Taboureau O, Olsen L, Jorgensen FS, Vermeulen NPE, Oostenbrink C. J Chem Inf Model. 2009;49:43. doi: 10.1021/ci800371f. [DOI] [PubMed] [Google Scholar]

[R17] 17.Rydberg P, Hansen SM, Kongsted J, Norrby PO, Olsen L, Ryde U. J Chem Theory Comput. 2008;4:673. doi: 10.1021/ct700313j. [DOI] [PubMed] [Google Scholar]

[R18] 18.Gleeson MP, Davis AM, Chohan KK, Paine SW, Boyer S, Gavaghan CL, Arnby CH, Kankkonen C, Albertson N. J Comput-Aided Mol Des. 2007;21:559. doi: 10.1007/s10822-007-9139-6. [DOI] [PubMed] [Google Scholar]

[R19] 19.Li J, Abel R, Zhu K, Cao Y, Friesner R. Proteins: Struct, Funct, Bioinf. 2011 doi: 10.1002/prot.23106. [DOI]

[R20] 20.Bathelt CM, Mulholland AJ, Harvey JN. J Phys Chem A. 2008;112:13149. doi: 10.1021/jp8016908. [DOI] [PubMed] [Google Scholar]

[R21] 21.Tian L, Friesner RA. J Chem Theory Comput. 2009;5:1421. doi: 10.1021/ct900040n. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Glide, version 5.6. Schrödinger, Inc; New York, NY: 2010. [Google Scholar]

[R23] 23.Prime, version 3.0. Schrödinger, Inc; New York, NY: 2011. [Google Scholar]

[R24] 24.Halgren TA, Murphy RB, Friesner RA, Beard HS, Frye LL, Pollard WT, Banks JL. J Med Chem. 2004;47:1750. doi: 10.1021/jm030644s. [DOI] [PubMed] [Google Scholar]

[R25] 25.Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, Repasky MP, Knoll EH, Shelley M, Perry JK, Shaw DE, Francis P, Shenkin PS. J Med Chem. 2004;47:1739. doi: 10.1021/jm0306430. [DOI] [PubMed] [Google Scholar]

[R26] 26.Paine MJI, McLaughlin LA, Flanagan JU, Kemp CA, Sutcliffe MJ, Roberts GCK, Wolf CR. J Biol Chem. 2003;278:4021. doi: 10.1074/jbc.M209519200. [DOI] [PubMed] [Google Scholar]

[R27] 27.Jaguar, version 7.6. Schrödinger, Inc; New York, NY: 2010. [Google Scholar]

[R28] 28.Xiang ZX, Honig B. J Mol Biol. 2001;311:421. doi: 10.1006/jmbi.2001.4865. [DOI] [PubMed] [Google Scholar]

[R29] 29.Duane S, Kennedy AD, Pendleton BJ, Roweth D. Phys Lett B. 1987;195:216. [Google Scholar]

[R30] 30.Tuckerman M, Berne BJ, Martyna GJ. J Chem Phys. 1992;97:1990. [Google Scholar]

[R31] 31.Wang B, Yang LP, Zhang XZ, Huang SQ, Bartlam M, Zhou SF. Drug Metab Rev. 2009;41:573. doi: 10.1080/03602530903118729. [DOI] [PubMed] [Google Scholar]

[R32] 32.Guengerich FP, Miller GP, Hanna IH, Martin MV, Leger S, Black C, Chauret N, Silva JM, Trimble LA, Yergey JA, Nicoll-Griffith DA. Biochemistry. 2002;41:11025. doi: 10.1021/bi020341k. [DOI] [PubMed] [Google Scholar]

[R33] 33.Guengerich FP, Hanna IH, Martin MV, Gillam EMJ. Biochemistry. 2003;42:1245. doi: 10.1021/bi027085w. [DOI] [PubMed] [Google Scholar]

[R34] 34.Guengerich FP. Chem Res Toxicol. 2001;14:611. doi: 10.1021/tx0002583. [DOI] [PubMed] [Google Scholar]

[R35] 35.Shaik S, Kumar D, de Visser SP, Altun A, Thiel W. Chem Rev. 2005;105:2279. doi: 10.1021/cr030722j. [DOI] [PubMed] [Google Scholar]

[R36] 36.Wang YH, Li Y, Wang B. J Phys Chem B. 2007;111:4251. doi: 10.1021/jp071222n. [DOI] [PubMed] [Google Scholar]

[R37] 37.Schneebeli ST, Hall ML, Breslow R, Friesner R. J Am Chem Soc. 2009;131:3965. doi: 10.1021/ja806951r. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Olsen L, Rydberg P, Rod TH, Ryde U. J Med Chem. 2006;49:6489. doi: 10.1021/jm060551l. [DOI] [PubMed] [Google Scholar]

[R39] 39.Rydberg P, Ryde U, Olsen L. J Phys Chem A. 2008;112:13058. doi: 10.1021/jp803854v. [DOI] [PubMed] [Google Scholar]

[R40] 40.Rowland P, Blaney FE, Smyth MG, Jones JJ, Leydon VR, Oxbrow AK, Lewis CJ, Tennant MG, Modi S, Eggleston DS, Chenery RJ, Bridges AM. J Biol Chem. 2006;281:7614. doi: 10.1074/jbc.M511232200. [DOI] [PubMed] [Google Scholar]

[R41] 41.de Groot MJ, Ackland MJ, Horne VA, Alex AA, Jones BC. J Med Chem. 1999;42:1515. doi: 10.1021/jm981118h. [DOI] [PubMed] [Google Scholar]

[R42] 42.Lill MA, Dobler M, Vedani A. ChemMedChem. 2006;1:73. doi: 10.1002/cmdc.200500024. [DOI] [PubMed] [Google Scholar]

[R43] 43.Feifel N, Kucher K, Fuchs L, Jedrychowski M, Schmidt E, Antonin KH, Bieck PR, Gleiter CH. Eur J Clin Pharmacol. 1993;45:265. doi: 10.1007/BF00315394. [DOI] [PubMed] [Google Scholar]

[R44] 44.Olesen OV, Linnet K. Drug Metab Dispos. 1997;25:740. [PubMed] [Google Scholar]

[R45] 45.Geertsen S, Foster BC, Wilson DL, Cyr TD, Casley W. Xenobiotica. 1995;25:895. doi: 10.3109/00498259509046661. [DOI] [PubMed] [Google Scholar]

PERMALINK

IDSite: An accurate approach to predict P450-mediated drug metabolism

Jianing Li

Severin T Schneebeli

Joseph Bylund

Ramy Farid

Richard A Friesner

Abstract

Introduction

Methods and Materials

IDSite methodology

Figure 1.

Table 1.

1) Glide docking

Figure 2.

2) PLOP refinements

Figure 3.

Figure 4.

Side-chain moves

Rigid body moves

Hybrid Monte Carlo moves

Table 2.

Figure 5.

Figure 6.

3) Evaluation

Figure 7.

Reactivity model

Preparation of protein and ligands

Results and Discussion

Table 3.

Table 4.

Figure 10.

Analysis of Induced-fit Effects

Figure 14.

Figure 15.

Figure 17.

Figure 16.

The importance of structural effects in determining SOMs

Figure 18.

Figure 20.

Computational Times

Conclusion

Supplementary Material

Figure 8.

Figure 9.

Figure 11.

Figure 12.

Figure 13.

Figure 19.

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases