Abstract
Introduction:
Macromolecular X-ray crystallography and cryo-EM are currently the primary techniques used to determine the three-dimensional structures of proteins, nucleic acids, and viruses. Structural information has been critical to drug discovery and structural bioinformatics. The integration of artificial intelligence (AI) into X-ray crystallography has shown great promise in automating and accelerating the analysis of complex structural data, further improving the efficiency and accuracy of structure determination.
Areas covered:
This review explores the relationship between X-ray crystallography and other modern structural determination methods. It examines the integration of data acquired from diverse biochemical and biophysical techniques with those derived from structural biology. Additionally, the paper offers insights into the influence of AI on X-ray crystallography, emphasizing how integrating AI with experimental approaches can revolutionize our comprehension of biological processes and interactions.
Expert opinion:
Investing in science is crucially emphasized due to its significant role in drug discovery and advancements in healthcare. X-ray crystallography remains an essential source of structural biology data for drug discovery. Recent advances in biochemical, spectroscopic, and bioinformatic methods, along with the integration of AI techniques, hold the potential to revolutionize drug discovery when effectively combined with robust data management practices.
Keywords: artificial intelligence, drug discovery, ligand identification and refinement, machine learning, protein-small molecule agent complexes, structure validation
1. The Role of Macromolecular X-ray Crystallography in Drug Discovery
1.1. Nature-inspired Drugs
A search for effective ways to treat illnesses in both humans and in domesticated animals has been ongoing throughout human civilization. Natural medicine relies primarily on plant-based remedies to prevent or cure various health conditions. For decades, these natural compounds have served as lead compounds in pharmaceutical research, potentially yielding derivative drugs with enhanced therapeutic properties and reduced side effects [2]. Moreover, the study of synergistic effects resulting from combinations of multiple metabolites present in whole plants holds significant promise for understanding complex disease mechanisms and advancing pharmaceutical development [3].
Synergistic effects manifest when the combined activity of at least two drugs exceeds their individual effects. In this context, X-ray Crystallography (XRC) assumes a critical role in exploring interactions between natural agents and target proteins. XRC enables the investigation of three essential scenarios: (1) the interaction of a single agent with multiple targets; (2) the interaction of at least two different agents with the same target, particularly when they bind to the same binding site; and (3) the interaction of at least two different agents with multiple targets, collectively contributing to therapeutically beneficial actions. These scenarios necessitate a comprehensive elucidation of the agent-protein interactions, a realm where XRC offers valuable insights into nature-inspired drug discovery and the development of innovative pharmaceutical treatments.
1.2. Structure-Based Drug Design
Drug discovery is a complex process of identifying potential targets and therapeutic agents, optimizing agents’ properties, and evaluating their efficacy and safety. The oldest approach to drug discovery is serendipity, which means discovery by chance—trial and error. However, researchers are constantly exploring new approaches to evaluate the potential benefits of natural medicine and create novel drugs. Another approach for drug discovery is by chemical modifications of known drugs or natural products to enhance bioavailability or minimize side effects.
X-ray crystallography is integral to modern drug design strategies (Figure 1). Structure-Based Drug Design (SBDD) [4] involves using the three-dimensional structure of a biological target, usually a protein, to design small molecule agents (SMA) that bind to and modulate the function of the target [5, 6]. This approach uses the atomic-level details provided by X-ray crystallography or cryo-EM techniques to understand the target’s structural and chemical features and design drugs that could interact with high specificity and affinity [7]. For example, the development of HIV protease inhibitors for treating AIDS was based on the structure of the HIV protease enzyme determined by X-ray crystallography [8, 9].
Figure 1.
Drug design cycle. The processes directly affected by structural biology are indicated in green. Figure created with BioRender.
1.3. Fragment-Based Drug Design
Fragment-Based Drug Design (FBDD) involves screening a library of small, low molecular weight fragments to identify those that bind to a target and then using those fragments as a starting point for designing larger, more potent drug candidates [10–12]. The fragments used in FBDD typically have low affinity for the target but can be selected or designed to interact with specific regions of the target surface. Traditional FBDD avoided reactive molecules due to concerns about non-specific modifications. However, over the last decade, libraries of electrophilic fragments capable of covalently modifying protein targets have increasingly been used due to their high selectivity, potency, and extended duration of action [13, 14]. Fragment screening is usually performed using the techniques of X-ray crystallography, NMR spectroscopy, or surface plasmon resonance (SPR). The latter two techniques are used to screen for ligand binding, whereas crystallography provides the details of the binding site and its location. Crystallography is particularly well suited for prioritizing fragments for optimization and identifying alterations that could increase selectivity [14]. Screening can also be performed by molecular docking. Still, this approach is not very successful due to inadequate handling of protein flexibility by docking protocols, insufficient sampling of conformational space, inaccurate scoring functions, etc. In contrast, diffraction experiments can completely visualize the binding mode. After identifying a set of fragment hits, they are optimized by merging them or adding chemical groups to improve their binding affinity and selectivity [15]. FBDD has been successfully utilized to develop potent compounds for various targets, including kinases, protein-protein interactions, and G protein-coupled receptors (GPCRs). An example is provided by the FBDD-based development of vemurafenib, a drug used to treat melanoma [16, 17]. Fragment screening identified a set of hits that targeted the mutated form of BRAF kinase, and these hits were subsequently optimized to create a potent and selective inhibitor [18]. Two examples of FDA approved drugs designed using covalent fragment libraries are acalabrutinib and sotorasib. Acalabrutinib is for the treatment of mantle cell lymphoma, chronic lymphocytic leukemia, or small lymphocytic lymphoma. Sotorasib is for the treatment of non-small cell lung cancer with an abnormal KRAS G12C gene [13].
2. Structural Biology studies of SMAs complexed with macromolecules
2.1. SMAs in DrugBank in the context of macromolecular structures
X-ray crystallography, cryo-EM, and NMR are the most informative techniques among numerous biophysical methods that provide information about the interactions of SMA with macromolecules [19, 20]. One of the most valuable resources cataloging information on such interactions is DrugBank [21], which contains a list of proteins that interact with each agent and classifies the interactions as on- or off-target. Off-target interactions are interactions of the agent with macromolecules other than the intended or primary target protein. These interactions may lead to unintended side effects, both beneficial and harmful. Until recently, X-ray crystallography was the primary method used to get accurate structural information about the interactions of ligands and macromolecules. Figure 2 shows the growth of DrugBank SMA as demonstrated by the presence of at least one SMA-macromolecule complex structure in the Protein Data Bank (PDB). Currently, 48% of DrugBank SMAs are present in the PDB.
Figure 2.
A) The growth of “first occurrence” SMA for X-ray crystallography and cryo-EM in the PDB, where “first occurrence” means that a particular SMA in the DrugBank had not been previously observed in complex with a macromolecule in the PDB using that experimental method. Dark blue and orange indicate X-ray and cryo-EM methods, respectively. B) The number of SMA found in macromolecular structures obtained by X-ray diffraction experiments. The green color shows agents approved by FDA (drugs). Pink indicates agents that are not approved. For comparison, SMA not found in any PDB structure is also shown. The DrugBank entries are divided into 5 distinct groups: Approved: 2742, Nutraceutical: 117, Illicit: 205, Investigational: 4523, Withdrawn: 286, Experimental: 6345.
Only 22% of entries in the DrugBank are classified as approved drugs (Figure 2B). Some of them were structurally determined only in complexes with non-human macromolecules. Currently, 3092 SMAs from the DrugBank were found in one or more human X-ray macromolecule structures. The overwhelming majority (85%) of those structures were determined by X-ray diffraction (Figure 3).
Figure 3.
All macromolecular structures from PDB determined by X-ray and cryo-EM. Dark blue and dark orange indicate structures containing SMA from DrugBank, whereas light blue and light orange indicate structures that do not contain a DrugBank SMA.
At present, cryo-EM and X-ray crystallography provide complementary information in drug discovery. Many recent structures obtained by cryo-EM have sufficient resolution to model SMA-macromolecule binding [22]; however, the resolution of X-ray structures is usually higher and thus can provide more precise models of SMA interactions (Figure 4). It is important to note that the definitions of resolution in cryo-EM and X-ray diffraction are different; consequently, direct comparisons of reported resolutions can be misleading [23].
Figure 4.
A) Resolution distribution for structures of human macromolecules with at least one SMA from DrugBank. B) A single SMA structure covered by density maps obtained with X-rays (left, PDB id 6LPL) and cryo-EM (right, PDB id 7T32). The resolution reflects the median value of all X-ray and cryo-EM deposited structures. The SMA (DrugBank Accession Number: DB08770, PDB compound id: ZMA, chemical name: 4-{2-[(7-amino-2-furan-2-yl[1,2,4]triazolo[1,5-a][1,3,5]triazin-5-yl)amino]ethyl}phenol) is bound to the adenosine 2A receptor. Map representations were generated with PyMOL [27].
X-ray crystallography is a powerful technique for determining the precise position of ligand binding within a macromolecule (Figure 4B). However, crystallization often requires high concentrations of non-physiological chemicals, which will affect binding properties [24]. To maximize the chances of success, experimenters often employ exceptionally high concentrations of ligands that exceed physiological conditions to saturate all binding sites. It is important to note that XRC has to be combined with methods that allow determining binding affinities for ligands, like surface plasmon resonance [25] or isothermal titration calorimetry [26].
2.2. The Role of Artificial Intelligence in X-ray Crystallography
In early 2023, the number of experimental structures deposited in the PDB surpassed 200,000, which is quite an achievement for research that has been primarily publicly funded. That number has since been dwarfed by the tsunami of structures that accompanied the breakthrough of AI-driven structure prediction. Deep Mind’s AlphaFold2 [28, 29] success rate in predicting macromolecular structure is more than astonishing. Competing programs, such as RosettaFold [30] and ESMFold [31], have demonstrated similar levels of predictive accuracy. A week after the publication of AF2 [32], DeepMind published the structural prediction of the human proteome and 19 other genomes [29]. AlphaFold DB now contains over 200 million entries for 48 organisms [33]. The ESM Metagenomic Atlas [34], a collaboration between MetaAI and the European Bioinformatics Institute, now includes 772 million entries and is being released concordantly with MGnify [35].
With so many structure prediction approaches under active development, it is impossible to discern which method will best address some of the remaining limitations of computational structural biology. However, it is evident that AI and experimental structural biology will enjoy a symbiotic relationship for the foreseeable future. Soon after the release of AF2, it became apparent that AI enables molecular replacement even for proteins with little sequence similarity to experimentally determined macromolecules. Consequently, MR will soon become the almost universal method for crystallographic structure determination. Essentially, AI solved the phase problem in macromolecular crystallography [36–39] and thus simplified and sped up the protein structure determination process.
All the previously mentioned databases of predicted structures were generated under the premise of “One gene, one protein.” Although the human genome has fewer than 20,000 protein-coding genes, alternative splicing, and other transcriptional and translational regulation events give rise to over 165,000 different isoform transcripts [40]. The CHESS Human Protein Structure Database contains structures of all the annotated isoforms of human proteins [41]. They even used some of the ColabFold [42] predicted structures to guide the designation of the reference isoform. This effort serves as a critical test for AI’s ability to deal with protein sequences widely divergent from the reference isoform, especially in the case of frame shifts. For example, protein isoforms resulting from a frameshift may have a sequence too distant from a natural sequence for structural predictions to be accurate. Experimentally determining the structure of interesting or disease-associated isoforms would be extremely valuable for drug discovery. Irregular gene expression is common in many complex diseases, including cancers, Alzheimer’s, cardiovascular disease, and diabetes [43]. Considering alternative isoforms as drug targets will play an increasingly important role in the near future because next-generation therapeutics will offer isoform-specific therapies. For example, splice junctions unique to particular isoforms can be targeted with CRISPR-Cas13d approaches to transcript degradation [44]. Post-translationally, isoforms can be targeted for degradation using aptamer- or nanobody-based proteolysis-targeting chimeras (PROTACs) [45].
X-ray diffraction is critical to enlarge and refine the AI training datasets. Protein design algorithms will need experimental proof of the consequences of “small sequence changes.” Pharmaceutical companies may have enough proprietary structural information to train their own models regarding mutational consequences. Regardless, ligand binding predictions will need experimental verification, and protein-protein interaction predictors will benefit from the atomic details of predicted interfaces.
The sheer scale of the number of predicted structural models should not be considered as an indication of the discipline’s maturity. Current models still have shortcomings that many researchers are trying to address, including the relative positioning of domains in multidomain proteins or “chains” in protein complexes. Soon after AlphaFold was released, other researchers devised several ways to use it to predict the quaternary structure of protein complexes, including by adding linkers between protein chains [46] or tinkering with AF2 parameters [42, 47]. But researchers did not have to rely on these simple hacks for long before software designed for the task was produced. The DeepMind team addressed protein complexes with AF2-Multimer [48]. I-TASSER-MTD can even use cross-linking mass spectrometry data as a component of quaternary structure prediction [49]. The ability of crystallography and cryo-EM to explicitly determine the quaternary structure of macromolecules is a significant advantage over purely predictive programs.
There are many other ways that X-ray crystallography can benefit from AI in addition to molecular replacement models. Machine Learning (ML) algorithms can be used to optimize experimental conditions, for example, by detecting ice or other diffraction artifacts during data collection [50]. It can also increase the accuracy of structures by assisting in several stages, including initial model building, refining, and validating the structural model. Unexpectedly, AF2 can predict the location of disordered regions [38] better than dedicated programs. AI can be used in the early stages to optimize protein expression [51]. For example, the RBS Calculator [52] can design synthetic ribosome binding sites to control translation initiation rates and protein expression levels. Although crystallization still relies on screening for initial conditions, programs such as DeepCrystal, which uses deep learning, are working towards predicting crystallization conditions [53]. AI can also be used during crystal harvesting to detect the presence of crystals in drops [53]
CheckMyBlob [54, 55] is an excellent example of how AI can identify ligands within electron density maps obtained by X-ray crystallography. Using ML algorithms CheckMyBlob autonomously detects ligands in partially modeled structures or validates ligands in existing structures based on experimental electron density maps. CheckMyBlob system was able to detect misidentified ligands in several case studies (2PDT, 1FPX, 4RK3, 1KWN) and authors of these deposits corrected, re-refined, and re-deposited the structures with corrected ligands [55]
For training purposes, AI needs a large quantity of high-quality data. Inconsistencies, errors, or biases in the training data can negatively affect ML performance. For ML-based approaches like CheckMyBlob, better structures can be highly beneficial. One resource that can provide improved quality structures is PDB-REDO [56] which not only re-refines all PDB structures using state-of-the-art refinement approaches [57] but also checks and re-refines every new version deposited in the PDB. Additionally, it maintains a consistent level of refinement quality across different structures by applying uniform protocols, which is important for the data in the ML system as it will be processed and refined using the same standards. Subsequently, PDB-REDO provides higher-quality electron density maps than the original PDB depositions. However, it should be noted that improving structures has much less of an impact if they are not updated within the PDB.
These examples show that when X-ray crystallography and AI work hand-in-hand, they can greatly enhance and accelerate drug discovery and development. Both AI and X-ray diffraction can work together across various stages of protein structure determination and subsequent steps within structural biology and drug discovery processes.
2.3. Continuous validation across macromolecular structure determination
It is crucial to think of validation as an overarching principle that should permeate every step of the drug discovery process [58]. In the field of X-ray crystallography, validation has undergone a transformation, moving away from being a single task conducted solely after structural refinement. Instead, it has evolved into an iterative process that begins with target (project) evaluation, takes place during sample preparation, diffraction experiment, model building, refinement process, and final checking of structure when all unaccounted map densities have to be explained [59]. The last and most challenging step is to contextualize the structure within the realm of other structural biology and biomedical experiments. Ideally, the same batch of protein should be used for all experiments, or at the very least, an identical protocol should be followed when preparing samples for all experiments [57]. Although certain aspects of macromolecular validation can still benefit from improvement, the field remains strong. As a result, there has been a continuous enhancement in the quality of structural models within the PDB. Nevertheless, to ensure the accuracy of AI predictions of binding site details, identifying anomalies within the PDB is an integral aspect of the reliability of validation software, highlighting the ongoing necessity for robust validation platforms. The most successful structural biology laboratories have already embraced the concept of a continuous and ubiquitous approach to validation. It has recently been recognized that scientific progress would accelerate by adopting a “continuous and ubiquitous” approach [58].
2.4. Homologous, mammal, or human proteins
Examining protein sequences and structures has revealed that evolutionarily related sequences tend to have similar structures and functions, which is the “structure is more conserved than sequence” paradigm. Building upon this insight, numerous biomedical experiments have been conducted on homologous proteins, including structure elucidation. However, it has been recently shown that mindlessly following this assumption has limitations. Subtle differences in proteins from non-human species can have unforeseen structural consequences that can alter interactions with SMA. By directing research efforts toward human proteins, scientists can gain comprehensive insights into the mechanisms underlying human diseases and develop targeted therapies that maximize therapeutic benefits while minimizing adverse effects. Focusing on human proteins ensures that drug discovery efforts are tailored to address the intricacies of human biology, ultimately leading to safer and more effective treatments for patients.
Numerous experiments have been conducted on serum albumins in animals and humans to explore their binding characteristics and interactions. The results indicate that ketoprofen exhibits distinct binding profiles, engaging different binding sites of albumins [60]. The differential binding of ketoprofen to albumin from various species highlights the necessity of studying human proteins.
2.5. Conclusion
X-ray crystallography will continue to be an essential technique in drug discovery and development, even as technology evolves. The improvements of synchrotron radiation sources and enhancements in detector technology have enabled a more accurate determination of protein structures. Furthermore, using microcrystals and innovative crystallization methods have paved the way for studying macromolecules that were previously beyond reach. Recent progress in micro-electron diffraction has expanded our ability to examine microcrystals that were too small for traditional crystallography [61, 62]. Moreover, by incorporating AI techniques, we can significantly enhance X-ray crystallography to determine macromolecular structures more accurately, identify new targets, and design innovative drugs with increased precision and efficiency. Consequently, X-ray crystallography is poised to maintain its vital role in future drug discovery and design.
Several essential advancements will shape the future of structural biology and drug discovery. First, complementary approaches, such as cryo-EM and NMR, will become increasingly more important. Second, AI and ML will play a significant role assisting on different stages of structural biology research. However, X-ray crystallography will remain a crucial technique in structural biology, as it continuously provides detailed insights into the structure of macromolecules due to its higher accuracy. Achievements of X-ray crystallography help to understand the fundamental principles of protein function, molecular interactions, and the design of novel drugs.
Despite the accelerating rate of advancements in AI and ML, these technologies cannot replace the need for experimental techniques like X-ray crystallography, as they may not always provide interpretable or reliable results, especially when applied to complex biological systems. Most desired solutions may be those that take advantage of the fusion of the two worlds, like CheckMyBlob, which uses machine learning algorithms for automated ligand identification and validation in electron density maps obtained through X-ray crystallography. These tools require minimal human intervention and can improve the overall efficiency and accuracy of the structure determination process; however, their success strongly depends on the quality and quantity of available data. Resources like PDB-REDO, which enhances the quality of existing crystallographic structures and provides consistent data quality, can further improve the performance of AI-based systems by providing better training data.
As mentioned before, AI-generated models, such as those produced by AF2, have limitations. These models may not always provide sufficient accuracy, particularly in cases where protein flexibility or structural heterogeneity is involved. Experimental techniques like X-ray crystallography will continue to be used in such scenarios. Nevertheless, enhancements in AI will create opportunities, not obstacles, for X-ray crystallography developments. AI-driven tools will aid in various stages of drug discovery, including target selection, compound screening, and optimization of lead compounds.
X-ray crystallography will remain important in drug discovery and development. Crystallography has an already established role in SBDD, and as the application of FBDD is increasing, the role of crystallography will grow as well. X-ray crystallography provides high-resolution structural data of fragment-protein complexes and helps effectively guide the rational design of fragments into target-specific compounds. Furthermore, in the era of personalized medicine, the demand for structural biology techniques, including X-ray crystallography, is likely to expand. Future improvements in crystallization techniques could enable the crystallization of larger and more complex proteins, expanding the set of possible drug targets.
3. Expert Opinion
3.1. Data Management and Analysis
Structure determination of SMA in complex with macromolecules is a multi-step process. Sample preparation, i.e., obtaining well-diffracting crystals for diffraction experiments, preparing grids for cryo-EM experiments, or just proteins for biophysical experiments, is the most critical step. However, in many cases, the published (or deposited) experimental details of the sample preparation protocols are insufficient to produce a sample in another lab or even by a different person in the same lab. The wider use of automatic liquid handlers, automatic purification systems, and automatic screen-design tools will enforce the careful use of standard operational protocols. We anticipate that the new data sharing and management policy of the National Institutes of Health (NIH) will improve the quality of published protocols. Still, there is currently no system for reporting protocols in a standardized and uniform manner.
There are two resources very important for drug discoveries: DrugBank and Protein Data Bank. Funding agencies should be concerned about the minimal overlap between these resources (Figure 3). Similarly, these agencies should be apprehensive that many computational resources are either losing or suffering from inadequate funding. This financial shortfall could lead to the closure of certain research avenues critical for drug discovery. Furthermore, the inadequacy of database resources became apparent during the Covid-19 pandemic. The databases aggregated from various health facilities often suffer from design flaws and data entry errors [63]. The condition of these databases has made conducting sophisticated data analysis very challenging, thereby impeding the ability to make informed decisions. As expressed by a frustrated epidemiologist in an interview with The Washington Post, “We are flying blind” [64]. It is tragic that these errors have impacted millions of people and have gone largely unnoticed, especially when compared to software errors that resulted in two plane accidents. It is a calamity that slowly accumulated deaths have less impact than sudden tragedies.
3.2. From the Laboratory Bench to the Patient’s Bed
Over the past four decades, the time and cost involved in bringing new drugs to the market have experienced a significant increase, with big pharma facing costs in the range of billions of dollars [65]. High throughput screening systems have been developed for X-ray crystallography, NMR, and recently, cryo-EM to mitigate the lengthy development timeline and streamline the process. However, it is important to note that the total cost of drug development reflects the complex and highly regulated nature of the process. Notably, the cost of conducting clinical trials tends to be significantly higher than all the expenses associated with obtaining the lead compound.
The subsequent Grand Challenge confronting Biomedical Science is the simulation of therapeutic action at the cellular level. By acquiring a more profound understanding of molecular mechanisms driving cellular processes and evaluating the impact of therapeutic interventions such as drugs, researchers can engineer more targeted and effective treatments for a range of diseases. These models can also spotlight previously overlooked drug targets and treatment strategies that conventional methods might miss. We are on the brink of an era where combining single-cell long-read sequencing and structural biology could bolster personalized medicine. This approach tailors treatment plans to a patient’s unique genetic and molecular profile, potentially leading to superior outcomes and enhanced quality of life [66, 67].
Creating accurate and effective models of therapeutic action in human cells requires a multidisciplinary approach, incorporating computational modeling, experimental validation, and clinical testing. Simultaneously, AI and ML algorithms can analyze vast amounts of biological and clinical data to identify new drug targets and optimize drug development. Ultimately, modeling therapeutic activity in human cells can revolutionize medicine, leading to more efficient treatments, better patient outcomes, and a more comprehensive understanding of underlying diseases’ molecular mechanisms.
3.3. Investing in Science is the Optimal Choice for Securing a Nation’s Future
Investment in science has catalyzed significant breakthroughs in medical research, resulting in the development of new drugs, technologies, and treatment strategies that have profoundly transformed healthcare. By channeling funds into research, both government bodies and private organizations are equipping researchers with resources and support to probe new horizons, identify novel drug development targets, and generate innovative diagnostic tools and medical technologies. The COVID-19 pandemic underscored the importance of scientific research. Within months of the outbreak, researchers pinpointed the virus causing COVID-19 and developed vaccines leveraging well-researched mRNA technology platforms. This concerted effort resulted in unprecedented progress and impact, culminating in developing highly effective vaccines against COVID-19 in a record-breaking timeline. However, the World Health Organization recently released a statement suggesting that now is the time to prepare for the next, potentially more deadly biological threat [68].
Investing in science yields long-term benefits, such as improving our comprehension of the environmental factors contributing to diseases, fostering innovation in disease prevention methods, and reinforcing a more resilient and sustainable healthcare system. By understanding the root causes of diseases, researchers can devise preventative measures to mitigate the future impact of these diseases potentially. Research brings forth new knowledge, technologies, and products that can be transformed into commercial assets, thus generating revenue. In essence, investment in science and technology signifies investing in the nation’s future [69]. We urge all scientists to make an effort to promote science through public outreach.
Article Highlights box:
X-ray crystallography remains the primary source of experimental structural data for drug discovery.
Validation should be considered a comprehensive principle that should be incorporated at every stage of the drug discovery process, including target evaluation, sample preparation, diffraction experiment, model building, refinement, and structure contextualization.
The critical role of AI in structural biology and drug design lies in its ability to facilitate sophisticated data analysis, generate predictive models, improve understanding of molecular structures, and optimize the drug development process.
Effective and comprehensive data management is critical for solving the reproducibility crisis in life science.
Future improvements in crystallization techniques could enable the crystallization of larger and more complex proteins, expanding the set of possible drug targets.
Researchers should consider outreach to the public essential for continued public trust and support.
Acknowledgements:
The authors thank A Wlodawer and Z Dauter for valuable discussions. This paper is dedicated to the International Union of Crystallography on the occasion of its 75th anniversary.
Funding:
The authors are supported by a National Institutes of Health grant (GM132595) and via Harrison Family Funds
W Minor has been involved in the development of software and data management and data-mining tools; some of these have been commercialized by HKL Research. W Minor is a cofounder of HKL Research and a member of the board.
Footnotes
Declaration of Interest:
The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
Reviewer Disclosures:
Peer reviewers on this manuscript have no relevant financial or other relationships to disclose.
References
- 1.Li F-S, Weng J-K (2017) Demystifying traditional herbal medicine with modern approach. Nat Plants 3:17109. [DOI] [PubMed] [Google Scholar]
- 2.Newman DJ, Cragg GM (2020) Natural Products as Sources of New Drugs over the Nearly Four Decades from 01/1981 to 09/2019. J Nat Prod 83:770–803 [DOI] [PubMed] [Google Scholar]
- 3.Roell KR, Reif DM, Motsinger-Reif AA (2017) An Introduction to Terminology and Methodology of Chemical Synergy-Perspectives from Across Disciplines. Front Pharmacol 8:158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Anderson AC (2003) The Process of Structure-Based Drug Design speed at which drug leads can be identified and evaluated in silico. Chem Biol 10:787–797 [DOI] [PubMed] [Google Scholar]
- 5. Verlinde CL, Hol WG (1994) Structure-based drug design: progress, results and challenges. Structure 2:577–587 ** A detailed review of the progress of protein structure-based drug design and how it has exceeded expectations over the past decade, with a variety of opportunities, advances, and results that are astounding.
- 6. Schmidt T, Bergner A, Schwede T (2014) Modelling three-dimensional protein structures for applications in drug design. Drug Discov Today 19:890–897 ** This review discusses the importance of structural insights in drug discovery, particularly through protein structure prediction and homology modeling. It highlights the advancements in ligand binding considerations and model quality estimation, with examples of G-protein-coupled receptors (GPCRs) and ADMET-related proteins.
- 7.Fesik SW (1993) NMR structure-based drug design. J Biomol NMR 3:261–269 [DOI] [PubMed] [Google Scholar]
- 8. Erickson J, Neidhart DJ, VanDrie J, et al. (1990) Design, activity, and 2.8 Å crystal structure of a C2 symmetric inhibitor complexed to HIV-1 protease. Science 249:527–533 * This paper discusses the design of symmetrical inhibitors of HIV-1 protease based on the symmetry of the active site. They demonstrated potent inhibition of protease activity and HIV-1 infection, with exceptional selectivity, stability, and confirmation from the inhibitor-enzyme complex crystal structure.
- 9.Roberts NA, Martin JA, Kinchington D, et al. (1990) Rational design of peptide-based HIV proteinase inhibitors. Science 248:358–361 [DOI] [PubMed] [Google Scholar]
- 10.Price AJ, Howard S, Cons BD (2017) Fragment-based drug discovery and its application to challenging drug targets. Essays Biochem 61:475–484 [DOI] [PubMed] [Google Scholar]
- 11.Erlanson DA (2012) Introduction to fragment-based drug discovery. Top Curr Chem 317:1–32 [DOI] [PubMed] [Google Scholar]
- 12.Erlanson DA, Fesik SW, Hubbard RE, et al. (2016) Twenty years on: the impact of fragments on drug discovery. Nat Rev Drug Discov 15:605–619 [DOI] [PubMed] [Google Scholar]
- 13.McAulay K, Bilsland A, Bon M (2022) Reactivity of Covalent Fragments and Their Role in Fragment Based Drug Discovery. Pharmaceuticals (Basel) 15:1–22 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Keeley A, Petri L, Ábrányi-Balogh P, Keserű GM (2020) Covalent fragment libraries in drug discovery. Drug Discov Today 25:983–996 [DOI] [PubMed] [Google Scholar]
- 15.Kirsch P, Hartman AM, Hirsch AKH, Empting M (2019) Concepts and Core Principles of Fragment-Based Drug Design. Molecules 24:4309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kim A, Cohen MS (2016) The discovery of vemurafenib for the treatment of BRAF-mutated metastatic melanoma. Expert Opin Drug Discov 11:907–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Shaw HM, Nathan PD (2013) Vemurafenib in melanoma. Expert Rev Anticancer Ther 13:513–22 [DOI] [PubMed] [Google Scholar]
- 18.Halaban R, Zhang W, Bacchiocchi A, et al. (2010) PLX4032, a selective BRAF(V600E) kinase inhibitor, activates the ERK pathway and enhances cell migration and proliferation of BRAF melanoma cells. Pigment Cell Melanoma Res 23:190–200 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Vénien-Bryan C, Li Z, Vuillard L, Boutin JA (2017) Cryo-electron microscopy and X-ray crystallography: complementary approaches to structural biology and drug discovery. Acta Crystallogr F Struct Biol Commun 73:174–183 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Zheng H, Handing KB, Zimmerman MD, et al. (2015) X-ray crystallography over the past decade for novel drug discovery - where are we heading next? Expert Opin Drug Discov 10:975–989 ** The nature and limitations of X-ray crystallography are explored in the context of drug design, and future challenges are discussed.
- 21.Wishart DS, Feunang YD, Guo AC, et al. (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Merk A, Bartesaghi A, Banerjee S, et al. (2016) Breaking Cryo-EM Resolution Barriers to Facilitate Drug Discovery. Cell 165:1698–1707 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wlodawer A, Li M, Dauter Z (2017) High-Resolution Cryo-EM Maps and Models: A Crystallographer’s Perspective. Structure 25:1589–1597.e1 ** A comparison between cryo-EM and crystallographic maps revealed some discrepancies and highlighted the need for stricter standards in cryo-EM structure determination.
- 24.Majorek KA, Kuhn ML, Chruszcz M, et al. (2014) Double trouble-Buffer selection and His-tag presence may be responsible for nonreproducibility of biomedical experiments. Protein Sci 23:1359–68 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.O’Connell N (2021) Protein Ligand Interactions Using Surface Plasmon Resonance. Methods Mol Biol 2365:3–20 [DOI] [PubMed] [Google Scholar]
- 26.Wu D, Gucwa M, Czub MP, et al. (2023) Structural and biochemical characterisation of Co2+-binding sites on serum albumins and their interplay with fatty acids. Chem Sci. 14:6244–6258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Version 2.0 Schrödinger LLC The PyMOL Molecular Graphics System
- 28.Senior AW, Evans R, Jumper J, et al. (2020) Improved protein structure prediction using potentials from deep learning. Nature 577:706–710 [DOI] [PubMed] [Google Scholar]
- 29. Tunyasuvunakool K, Adler J, Wu Z, et al. (2021) Highly accurate protein structure prediction for the human proteome. Nature 596:590–596 ** AlphaFold was used to predict the structures of all proteins from humans and 19 other organisms. AlphaFold Protein Structure Database, or AlphFold DB, is available at https://alphafold.ebi.ac.uk/ and is described in reference 33.
- 30.Baek M, DiMaio F, Anishchenko I, et al. (2021) Accurate prediction of protein structures and interactions using a three-track neural network. Science 373:871–876 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Chowdhury R, Bouatta N, Biswas S, et al. (2022) Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol 40:1617–1623 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Jumper J, Evans R, Pritzel A, et al. (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596:583–589 **This article discusses the network and methodology underlying AlphaFold, the breakthrough machine learning structure prediction algorithm that was the first to consistently predict protein structures with an accuracy comparable to experimental methods. It vastly outperformed other methods in terms of accuracy and performance, and ushered in a new era of structural prediction.
- 33.Varadi M, Anyango S, Deshpande M, et al. (2022) AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50:D439–D444 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lin Z, Akin H, Rao R, et al. (2023) Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379:1123–1130 [DOI] [PubMed] [Google Scholar]
- 35.Richardson L, Allen B, Baldi G, et al. (2023) MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Res 51:D753–D759 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Barbarin-Bocahu I, Graille M (2022) The X-ray crystallography phase problem solved thanks to AlphaFold and RoseTTAFold models: a case-study report. Acta Crystallogr D Struct Biol 78:517–531 [DOI] [PubMed] [Google Scholar]
- 37. Borkakoti N, Thornton JM (2023) AlphaFold2 protein structure prediction: Implications for drug discovery. Curr Opin Struct Biol 78:102526. ** This article discusses the importance of accurate protein structure prediction, facilitated by artificial intelligence methods, in the process of small molecule drug discovery, highlighting current capabilities and assessing the potential impact of further advances in predictive methods on the discovery of new drugs.
- 38. Akdel M, Pires DEV, Pardo EP, et al. (2022) A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 29:1056–1067 ** This article examines the accuracy of AlphaFold on a multi-proteome level, confirming its overall accuracy and revealing its ability to predict disordered regions or protein complexes better than programs designed for the task. Conversely, they also reveal cases where high-confidence predicted structures incorrectly model some crucial structural aspects, and urge continued verification through experimental techniques.
- 39. Terwilliger TC, Afonine PV., Liebschner D, et al. (2023) Accelerating crystal structure determination with iterative AlphaFold prediction. Acta Crystallogr D Struct Biol 79:234–244 ** The paper discusses an automated procedure using AlphaFold predictions in iterative cycles that significantly accelerates experimental structure determination through molecular replacement, and suggests that AI-based predictions are a valuable tool in macromolecular structure determination.
- 40.Frankish A, Carbonell-Sala S, Diekhans M, et al. (2023) GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 51:D942–D949 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sommer MJ, Cha S, Varabyou A, et al. (2022) Structure-guided isoform identification for the human transcriptome. Elife 11:e82556 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Mirdita M, Schütze K, Moriwaki Y, et al. (2022) ColabFold: making protein folding accessible to all. Nat Methods 19:679–682 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.He S, Dong G, Cheng J, et al. (2022) Strategies for designing proteolysis targeting chimaeras (PROTACs). Med Res Rev 42:1280–1342 [DOI] [PubMed] [Google Scholar]
- 44.Morelli KH, Wu Q, Gosztyla ML, et al. (2023) An RNA-targeting CRISPR-Cas13d system alleviates disease-related phenotypes in Huntington’s disease models. Nat Neurosci 26:27–38 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhu H, Wang J, Zhang Q, et al. (2023) Novel strategies and promising opportunities for targeted protein degradation: An innovative therapeutic approach to overcome cancer resistance. Pharmacol Ther 244:108371 [DOI] [PubMed] [Google Scholar]
- 46.Gao M, Nakajima An D, Parks JM, Skolnick J (2022) AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 13:1744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Bryant P, Pozzati G, Elofsson A (2022) Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 13:1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Evans R, O’Neill M, Pritzel A, et al. (2022) Protein complex prediction with AlphaFold-Multimer. bioRxiv 2021.10.04.463034 [Google Scholar]
- 49.Zhou X, Zheng W, Li Y, et al. (2022) I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 17:2326–2353 [DOI] [PubMed] [Google Scholar]
- 50.Czyzewski A, Krawiec F, Brzezinski D, et al. (2021) Detecting anomalies in X-ray diffraction images using convolutional neural networks. Expert Syst Appl 174: 114740 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Martiny H-M, Armenteros JJA, Johansen AR, et al. (2021) Deep protein representations enable recombinant protein expression prediction. Comput Biol Chem 95:107596 [DOI] [PubMed] [Google Scholar]
- 52.Reis AC, Salis HM (2020) An Automated Model Test System for Systematic Development and Improvement of Gene Expression Models. ACS Synth Biol 9:3145–3156 [DOI] [PubMed] [Google Scholar]
- 53.Bischoff D, Walla B, Weuster-Botz D (2022) Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images. Anal Bioanal Chem 414:6379–6391 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Kowiel M, Brzezinski D, Porebski PJ, et al. (2019) Automatic recognition of ligands in electron density by machine learning. Bioinformatics 35:452–461 * CheckMyBlob, a machine learning algorithm, improves ligand identification from electron density maps, accelerating macromolecule-ligand complex modeling, and drug screening.
- 55. Brzezinski D, Porebski PJ, Kowiel M, et al. (2021) Recognizing and validating ligands with CheckMyBlob. Nucleic Acids Res 49:W86–W92 ** This article discusses the CheckMyBlob web server, which uses a machine learning algorithm to assist in ligand identification and validation from electron density maps. It provides users with ranked predictions and interactive 3D visualizations to facilitate structure-guided drug design.
- 56. Joosten RP, Long F, Murshudov GN, Perrakis A (2014) The PDB_REDO server for macromolecular structure model optimization. IUCrJ 1:213–20 ** This article introduces PDB_REDO, a web server that automates the refinement, model reconstruction, and validation processes for improving crystallographic structure models prior to PDB submission.
- 57. Shabalin IG, Porebski PJ, Minor W (2018) Refining the macromolecular model - achieving the best agreement with the data from X-ray diffraction experiment. Crystallogr Rev 24:236–262 ** A guide to the state-of-the-art refinement of macromolecular structures.
- 58.Bijak V, Gucwa M, Lenkiewicz J, et al. (2023) Continuous Validation Across Macromolecular Structure Determination Process. J Cryst Soc Japan 65:10–16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Gore S, Sanz García E, Hendrickx PMS, et al. (2017) Validation of Structures in the Protein Data Bank. Structure 25:1916–1927 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Czub MP, Stewart AJ, Shabalin IG, Minor W (2022) Organism-specific differences in the binding of ketoprofen to serum albumin. IUCrJ 9:551–561 ** This article shows differences in the binding mode of ketoprofen in highly homologous albumins from different species and urges caution when using animal models in place of human structures.
- 61.Gavira JA (2016) Current trends in protein crystallization. Arch Biochem Biophys 602:3–11 [DOI] [PubMed] [Google Scholar]
- 62.Bowler JT, Sawaya MR, Boyer DR, et al. (2022) Micro-electron diffraction structure of the aggregation-driving N terminus of Drosophila neuronal protein Orb2A reveals amyloid-like β-sheets. J Biol Chem 298:102396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Lenkiewicz J, Bijak V, Poonuganti S, et al. (2023) Structural biology and public health response to biomedical threats. Struct Dyn 10:034701 * The article highlights the collaboration between structural biologists and vaccine/treatment developers during the SARS-CoV-2 pandemic and emphasizes the need for comprehensive statistical data on infections, deaths, and vaccinations to effectively guide public health policy.
- 64.Achenbach J, Abutaleb Y (2021) Messy, incomplete U.S. data hobbles pandemic response - The Washington Post. https://www.washingtonpost.com/health/2021/09/30/inadequate-us-data-pandemic-response/. Accessed 23 May 2023
- 65.Wouters OJ, McKee M, Luyten J (2020) Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009–2018. JAMA 323:844–853 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66. Castaldi PJ, Abood A, Farber CR, Sheynkman GM (2022) Bridging the splicing gap in human genetics with long-read RNA sequencing: finding the protein isoform drivers of disease. Hum Mol Genet 31:R123–R136 ** The authors show how long-read sequencing can be used to link splicing quantitative trait loci (sQTLs) to transcript isoforms containing disease-relevant protein alterations. They also present an overview of sQTL studies that examine genome-wide association studies (GWAS) loci in the context of complex diseases and traits.
- 67.Sinitcyn P, Richards AL, Weatheritt RJ, et al. (2023) Global detection of human variants and isoforms by deep proteome sequencing. Nature Biotechnology 2023. 1–11 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ramsey LL WHO chief: World must prepare for deadlier outbreak than COVID-19. https://www.news-medical.net/news/20230523/WHO-chief-World-must-prepare-for-deadlier-outbreak-than-COVID-19.aspx? Accessed 24 May 2023
- 69. Grabowski M, Macnar JM, Cymborowski M, et al. (2021) Rapid response to emerging biomedical challenges and threats. IUCrJ 8:395–407 ** Here, the authors discuss the challenge of evaluating the large volume of COVID-19-related papers and deposited macromolecular models, and propose a resource that can be used to assess and improve the reproducibility of biomedical research during future crises.




