Abstract

Therapeutic drugs are required to target proteins in the cell, not in vitro. Yet, drug-induced protein folding in vivo is off limits to computational modeling efforts. This situation may change as artificial intelligence empowers molecular dynamics and enables the deconstruction of in vivo cooperativity for structural adaptation.
Keywords: in vivo protein folding, drug-induced protein folding, artificial intelligence, molecular dynamics, structure-based drug design, lead optimization
Drug Targeting in the Cell Is Unyielding to Modeling Efforts
A major problem in drug design is the structural adaptation of the target to the drug ligand.1,2 This problem known as the drug-induced folding problem is every bit as complex as the protein folding problem and turns the therapeutic target into a moving target, introducing huge complications that are only tractable computationally in very few instances.2 Yet another layer of complexity arises because drug targeting must be ultimately assessed in vivo, not in vitro, and in vivo contexts are completely off limits to current computational modeling of drug-induced folding.3,4 As argued in this piece, this situation is likely to change with the advent of artificial intelligence (AI) in the drug discovery arena.4
Computational efforts mainly stemming from molecular dynamics (MD) and geared at modeling protein folding in vivo have proven unyielding.3 This is not surprising and is mainly due to the wanton complexity of the cellular environment, which places it off limits vis-à-vis MD computations.3,4 Equally daunting becomes the modeling of protein folding induced by drug binding in an in vivo setting.4 Yet, drugs are designed and expected to target proteins within the cell, not in the test tube, so the computational modeling of drug targeting must face this reality.
Natural proteins evolved to fold in cellular environments, not in test tubes,5−8 and hence, they need to be therapeutically targeted in such contexts if the drugs are expected to have clinical relevance. When dealing with folding itself, we note that only a handful of small proteins actually refold in vitro as physiological conditions are recovered.3,9 Furthermore, chemical synthesis does not yield functional proteins.3 Thus, over the past 50 years, most efforts to generate in vitro folding pathways and drug-induced folding pathways through atomistic MD may have been misplaced, as the majority of proteins simply cannot renature in the absence of cellular components.10
This situation poses huge problems to biotechnology and biomedical engineering and makes it imperative to predict in vivo folding pathways and in vivo drug-induced folding pathways to identify mechanisms by which kinetically trapped misfolded structures are avoided, folding is expedited, and structural adaptation of the target to the drug effectively takes place. Generating realistic folding pathways remains a challenge for MD. Even in an in vitro context, realistic folding time scales and molecular events related to the structural adaptation of the drug target are usually off limits, while key rare events are missed. On the other hand, the participation of the cellular context in expediting protein folding and drug-induced folding is out of reach for current computational efforts,3,4 even on dedicated supercomputers.11 Drug targeting in vivo is simply beyond current computational efforts.
AI is Set to Deconstruct the in Vivo Context for Drug-Target Structural Adaptation
Recent advances in the application of deep learning approaches4,12 instill confidence in the possibility of empowering MD by subsuming short atomistic simulations within an AI platform. The goal is to propagate folding trajectories way beyond the times accessible to MD by encoding them in a simplified representation.4 This empowering of MD computations should enable the reverse engineering of the in vivo context that expedites the folding process and facilitates drug targeting in cases where induced folding needs to be accounted for. Thus, the seemingly intractable in vivo folding problem of direct relevance to drug targeting may be finally tackled with the advent of AI approaches-capable of recreating cellular reality.
The in vivo reality in which drug targeting takes place is comprised of hundreds of millions of atoms. Hence, describing the targeted protein chain at full geometric detail becomes operationally senseless. There is a huge informational burden imposed on each iterative integration step of the atomistic equations of motion. Therefore, MD runs, even on dedicated supercomputers,11 will be necessarily short, spanning subnanosecond time scales in the best of cases. This prompts us to suggest an AI system that learns a simplified encoding of the molecular geometry of the target protein that structurally adapts to the drug upon binding. We are advocating an “adiabatic” approach whereby fast molecular motions are averaged out as the AI system gets trained with short MD runs. The simplified dynamics, once “essentially learned” by the AI system, may be propagated far more easily to cover realistic time scales of meaningful molecular events.
To be more precise,
we advocate replacing geometric coordinates
(for example, the Φ, Ψ Ramachandran angles describing
the backbone torsional state, Figure 1) for the basins of attraction of the minima in the
potential energy surface, a coarser “topological” representation
(Figure 1).4 Thus, during the training of the AI system, the
dynamics of structural adaptation of the target protein may be represented
by two alternative descriptions, a geometric (i.e., “detailed”)
and a topological, with the latter far simpler to propagate beyond
the training period. The compatibility of both descriptions is achieved
when the AI system is fully trained and formalized by the commutativity
of the diagram presented in Figure 1. This commutativity essentially means that encoding
(denoted as π-operation) a geometrically specified state x(t) as a topological “modulo basin”
state
and subsequently propagating the latter
across the time step τ into the topological state
yields the same result as running
MD starting
from the state x(t) for the time
period τ until the state x(t + τ) is reached and then topologically encoding the destiny
state as
. The coarse propagator Γ is optimized
so both descriptions of the dynamics become compatible, as described
in Figure 1. Once this
is achieved by training the AI system on many short MD runs, the AI
system is able to propagate the simpler dynamics beyond the time scales
accessible to atomistic MD by iterating the Γ-operation on the
destiny state of the MD run. In this way, it should be possible to
capture meaningful molecular events relevant to the structural adaptation
of the target to the binding drug. This is possible because the AI
system has greatly simplified the molecular description as it adopted
the topological representation4,12 over the geometric
description and hence carries over a far lower informational burden
than an MD computation to determine each iteration.
Figure 1.
Artificial intelligence enables reverse engineering of drug targeting within the in vivo context. (a) Compatibility between geometric and topological description of the conformation of a targeted protein that undergoes induced folding as it binds a drug. This simplification is necessary to leverage an AI system purportedly capable of reverse engineering the complex in vivo context where the drug actually targets the protein. The top left panel shows the torsional Φ, Ψ angles visited by the protein backbone, while the top right panel simplifies this information “modulo basin”, i.e., averaging out fast motions through the encoding process (π-operation). (b) Multiple short MD runs at atomistic detail including the in vivo context are leveraged to train the AI system to propagate the simplified (topological) version of the dynamics beyond the time scales accessible to MD. Propagation requires learning to make the modulo-basin dynamics compatible with the MD during the training period (panel a). Crucially, the in vivo context is not incorporated explicitly by the AI-empowered computations of drug-induced folding, but its influence is subsumed in the parametrization of the propagator Γ.
Within the AI system, a topological description of the torsional state of the protein target (Figure 1a) may be regarded as a “text” or “message”, where one of the four Ramachandran basins is attributed to each residue along the chain (Figure 1a). Hence the “folding alphabet” is comprised of four “letters” or basins, and the message at time t + τ that follows the topological encoding of the torsional state at time t may be determined by maximum likelihood probabilities calculated based on short MD trajectories that are topologically encoded (simplified) for training purposes.4 The probability of a basin transition at each position in the chain is influenced by the basin assignment for residues at other positions, much like in a language translator, where context determines meaning of a word or phrase within a sentence. This “textual” representation of the chain has been shown to be amenable of being processed like a “folding language” adopting the so-called transformer architecture for the deep learning system.4 Thus, inferring “what the chain is going to say next” is very much reliant on how the chain has evolved when endowed with similar topology, as determined using a maximum likelihood scheme substantiated by the training set of MD runs.
In this way, we suggest constructing a transformer trained with numerous short MD runs, each contributing to portray the dynamic molecular reality of the cellular context but individually incapable of capturing long-time events relevant to the drug-induced folding process. Even on dedicated supercomputers, none of these atomistic MD simulations per se will generate realistic in vivo drug-induced folding pathways, but they can collectively inform and train the AI system empowered to propagate the dynamics of structural adaptation of the target protein to the drug upon binding in a cooperative in vivo context (Figure 1b). Crucially, the in vivo context is not incorporated explicitly by the AI-empowered computations, but its influence is implicit as it becomes subsumed in the parametrization of the propagator Γ (cf. Figure 1a).4 In this way, it will become possible to reverse engineer the cooperative in vivo context that expedites structural adaptation in the drug targeting process, a major imperative in the lead optimization stages of drug design.
The AI empowering of MD is expected to significantly broaden the technological base of the pharmaceutical industry as the in vivo reality that expedites drug-induced folding may finally yield to computational modeling efforts.
The author declares no competing financial interest.
References
- Sen S.; Udgaonkar J. B. (2019) Binding-induced folding under unfolding conditions: Switching between induced fit and conformational selection mechanisms. J. Biol. Chem. 294, 16942–16952. 10.1074/jbc.RA119.009742. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández A.; Bazan S.; Chen J. (2009) Taming the induced folding of drug-targeted kinases. Trends Pharmacol. Sci. 30, 66–71. 10.1016/j.tips.2008.11.001. [DOI] [PubMed] [Google Scholar]
- Sorokina I.; Mushegian A. (2018) Modeling protein folding in vivo. Biol. Direct 13, 13. 10.1186/s13062-018-0217-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fernández A. (2021) Artificial Intelligence platform for molecular targeted therapy: A translational science approach.; World Scientific Publishing, Singapore. [Google Scholar]
- Clark P. L.; Elcock A. H. (2016) Molecular chaperones: providing a safe place to weather a midlife protein-folding crisis. Nat. Struct. Mol. Biol. 23, 621–623. 10.1038/nsmb.3255. [DOI] [PubMed] [Google Scholar]
- Thommen M.; Holtkamp W.; Rodnina M. V. (2017) Co-translational protein folding: progress and methods. Curr. Opin. Struct. Biol. 42, 83–89. 10.1016/j.sbi.2016.11.020. [DOI] [PubMed] [Google Scholar]
- Sathyanarayanan U.; Musa M.; Dib P. B.; Raimundo N.; Milosevic I.; Krisko A. (2020) ATP hydrolysis by yeast Hsp104 determines protein aggregate dissolution and size in vivo. Nat. Commun. 11, 5226. 10.1038/s41467-020-20394-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hellerschmied D.; Lehner A.; Franicevic N.; Arnese R.; Johnson C.; Vogel A.; Meinhart A.; Kurzbauer R.; Deszcz L.; Gazda L.; Geeves M.; Clausen T. (2019) Molecular features of the UNC-45 chaperone critical for binding and folding muscle myosin. Nat. Commun. 10, 4781. 10.1038/s41467-019-12667-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anfinsen C. B. (1973) Principles that govern the folding of protein chains. Science 181, 223–230. 10.1126/science.181.4096.223. [DOI] [PubMed] [Google Scholar]
- Feng R.; Gruebele M.; Davis C. M. (2019) Quantifying protein dynamics and stability in a living organism. Nat. Commun. 10, 1179. 10.1038/s41467-019-09088-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piana S.; Shaw D. E. (2018) Atomic-Level Description of Protein Folding inside the GroEL Cavity. J. Phys. Chem. B 122, 11440–11449. 10.1021/acs.jpcb.8b07366. [DOI] [PubMed] [Google Scholar]
- Fernández A. (2020) Artificial intelligence teaches drugs to target proteins by tackling the induced folding problem. Mol. Pharmaceutics 17, 2761–2767. 10.1021/acs.molpharmaceut.0c00470. [DOI] [PubMed] [Google Scholar]

