Abstract
Proteins carry out essential cellular functions – signaling, metabolism, transport – through the specific interaction of small molecules and drugs within their three-dimensional structural domains. Protein domains are conserved folding units that, when combined, drive evolutionary progress. The Evolutionary Classification Of protein Domains (ECOD) places domains into a hierarchy explicitly built around distant evolutionary relationships, enabling the detection of remote homologs across the proteomes. Yet no single resource has systematically mapped domain-ligand interactions at the structural level. To fill this gap, we introduce DrugDomain v2.0, an updated comprehensive resource, that extends earlier releases by linking evolutionary domain classifications (ECOD) to ligand binding events across the entire Protein Data Bank. We also leverage AI-driven predictions from AlphaFold to extend domain-ligand annotations to human drug targets lacking experimental structures. DrugDomain v2.0 catalogs interactions with over 37,000 PDB ligands and 7560 DrugBank molecules, integrates more than 6000 small–molecule–associated post-translational modifications, and provides context for 14,000 + PTM-modified human protein models featuring docked ligands. The database encompasses 43,023 unique UniProt accessions and 174,545 PDB structures. The DrugDomain data is available online: https://drugdomain.cs.ucf.edu/ and https://github.com/kirmedvedev/DrugDomain.
Keywords: Small molecules, Drug discovery, Protein-drug interaction, Protein domains, Drugs, Database
Graphical Abstract
1. Introduction
Studying how small molecules and drugs interact with protein structural domains lies at the heart of understanding both molecular function and guiding drug discovery. Through the binding of endogenous cofactors, metabolites, or exogenous drugs within their structural three-dimensional domains, proteins participate in a variety of vital cellular processes, including signaling, metabolism, and transport. Protein domains are conserved structural, functional, and evolutionary units that serve as the essential building blocks for protein diversity and adaptation [1]. The different ways in which protein domains can be combined provide a powerful mechanism for evolving new protein functions and shaping cellular processes [2]. Identifying and categorizing protein domains based on their evolutionary relationships can enhance our understanding of protein function. This is achieved by examining the established functions of their homologs. Until recently, major structure-based classifications of protein domains were primarily centered on categorizing experimentally determined protein structures, e.g., SCOP [3] and CATH [4]. Our team has developed and maintains the Evolutionary Classification of Protein Domains database (ECOD), whose key feature is its emphasis on distant homology, which culminates in a comprehensive database of evolutionary relationships among categorized domains' topologies [5], [6]. Mapping the protein-ligand interactions at the domain level can reveal the mechanistic basis of protein function and inform structure-based drug discovery.
Artificial intelligence provides powerful tools for scientific research across diverse fields, and structural computational biology is no exception. AlphaFold (AF) has revolutionized structural biology by demonstrating atomic-level precision in protein structure prediction and becoming an indispensable tool in the field [7]. Leveraging AF models, ECOD stands out as one of the first databases to provide comprehensive domain classifications for both the entire human proteome [8] and the complete proteomes of 48 additional model organisms [6]. Recently, The Encyclopedia of Domains (TED) [9] was released - a comprehensive resource for the identification and classification of protein domains within the AlphaFold Database [10]. This advancement by AlphaFold has significantly broadened the scope of computational structural biology, enabling diverse applications such as drug discovery, drug target prediction, and the analysis of protein-protein and protein-ligand interactions [11], [12]. The new release of AlphaFold3 has further improved the accuracy of protein structure and protein-ligand interaction predictions [7].
As of today, no available resource reports interactions between protein structural domains (based on evolutionary classification) and ligands. With the latest advances in AI-based methods for predicting protein structure and protein-ligand interactions, we are witnessing a paradigm shift where computational approaches achieve performance levels nearly comparable to those of experimental methods. Here we present DrugDomain v2.0 (https://drugdomain.cs.ucf.edu/), a comprehensive database detailing the interactions of structural protein domains with a wide array of small organic (including drugs) and inorganic compounds, and – unlike previous versions – covering the full breadth of the Protein Data Bank. Our dataset encompasses all ligands in the Protein Data Bank that interact with protein structures. The database also provides domain-drug interactions for AlphaFold models of human drug targets without solved experimental structures [13]. It also features over 6000 small-molecule binding-associated PTMs and more than 14,000 PTM-modified human protein models with docked ligands [14]. In total, the database now encompasses 43,023 unique UniProt accessions, 174,545 PDB structures, 37,367 PDB ligands, and 7561 DrugBank molecules. We believe this resource can serve as a foundation for a range of forward‑looking studies – including drug repurposing, the development of improved docking protocols, and the analysis of post‑translational modifications in protein-ligand interactions.
2. Materials and methods
2.1. Data collection and analysis
The comprehensive list of ligands and small molecule components found in Protein Data Bank [15] was retrieved from Chemical Component Dictionary [16]. All PDB entries containing these ligands and small molecules’ InChI Key and SMILES formulas were obtained using rcsb-api [17]. Using InChI Keys and SMILES, we retrieved accession numbers for each small molecule from the following databases, where available: DrugBank [18], PubChem [19], ChEMBL [20]. In the DrugDomain database, we use the PDB ligand ID as a primary identifier for the small molecule (for example, NAD, 2I4, etc.). Alternatively, we use DrugBank accession for cases when the PDB ligand ID is unknown. Additionally, drug action data were retrieved from DrugBank and affinity data from BindingDB [21]. Chemical classification of small-molecule components was obtained from the ClassyFire database [22] and includes the four top levels of the classification: kingdom, superclass, class and subclass. 2D diagrams of ligand-protein interactions (LigPlots) were generated using LigPlot+ as in v1.0 and v.1.1 [23].
For each ligand-protein (PDB structure) pair, residues located within 5 Å of the atoms of the small molecule were identified using BioPython [24]. Interacting residues were mapped to structural domains from ECOD database v292 (08302024) [5] and reported in DrugDomain. For ligand–protein pairs lacking experimentally determined structures, we used AlphaFold models and the AlphaFill algorithm [25] to transplant missing ligands from PDB structures into these models based on sequence and structural similarity. This process was performed in DrugDomain v1.0 for the subset of human proteins known to interact with small molecules and drugs from DrugBank. The methodology and implementation of this approach into the DrugDomain database was described previously [13]. To calculate ligand-interacting statistics based on the number of domains, we counted the UniProt-accessioned proteins that included a specific number of ECOD domains interacting with the ligand.
In DrugDomain v1.1 we explored the effect of post-translational modifications (PTMs) on small molecule binding for the subset of human proteins from v1.0. We used recent AI-based approaches for protein structure prediction (AlphaFold3 [7], RoseTTAFold All-Atom [26], Chai-1 [27]) and generated 14,178 models of PTM-modified human proteins with docked ligands [14]. To do that, we identified PTMs within 10 Å of all atoms of each small molecule bound to human proteins in the subset of human proteins from v1.0. The overall number of identified small molecule binding-associated PTMs was 6131. Overall, we generated 1041 AlphaFold3, 9169 RoseTTAFold All-Atom and 3968 Chai-1 PTM-modified models. Each DrugDomain webpage includes a placeholder indicating the availability of PTM data for each protein–small molecule combination presented in the DrugDomain database. If PTM data is available, there is a link “List of drug binding-associated PTMs”; otherwise, it states “No PTM data available”. The major novelty of DrugDomain v2.0, compared to previous versions (v1.0 and v1.1), is the inclusion of domain–ligand interaction data across the entire Protein Data Bank. In addition to the human protein subset and small molecules from DrugBank (v1.0 and v1.1), we incorporated all ligands from the PDB and all experimental protein structures that interact with these ligands.
3. Results and discussion
3.1. DrugDomain v2.0 statistics and features
DrugDomain v2.0 includes the following major types of data related to interactions between protein domains and small molecule components. First, the new version of DrugDomain reports domain-ligand interactions for all PDB entries containing ligand entities, including both organic small molecules and inorganic components. Thus, we expanded the scope of the database to encompass not only protein-drug interactions but also interactions between protein domains and all ligand entities that are present in PDB. Second, the v2.0 reports domain-drug interactions for AlphaFold models of human drug target proteins lacking experimentally determined structures [13]. Third, it includes over 6000 small molecule binding-associated PTMs identified in the human proteome and over 14,000 PTM-modified human proteins with docked ligands generated using recent AI-based approaches (AlphaFold3 [7], RoseTTAFold All-Atom [26], Chai-1 [27]) [14]. To help users navigate between different types of data, we created a detailed tutorial (https://github.com/kirmedvedev/DrugDomain/wiki/DrugDomain-database-Tutorial). DrugDomain database v2.0, includes 43,023 unique UniProt accessions [28], 174,545 PDB structures (over 70 % of all experimental protein structures), 37,367 ligands from PDB, 7561 DrugBank molecules (over 50 % of all small molecule drugs in DrugBank) (Fig. 1).
Fig. 1.
DrugDomain database v2.0 data types and statistics.
DrugDomain includes two types of hierarchy: protein and molecule-centric. The complete lists of proteins and small molecules can be accessed through the top menu. There are two types of molecule lists – by DrugBank accession and by PDB ligand ID. The protein or molecule can be searched using the search field on the main page or the quick search option at the navigation bar. The search can be conducted using UniProt (e.g. Q03181), PDB ligand (e.g. ATP), DrugBank accessions (e.g. DB00171), or SMILES formula. The search by UniProt accession returns a list of ligands known or predicted to interact with the query protein, along with key data for each ligand: PDB ID; DrugBank, PubChem, and ChEMBL accessions; molecule name; drug action; and affinity. The molecule search (by PDB ligand ID, DrugBank accession, or SMILES formula) returns a list of proteins known or predicted to bind the query molecule, along with key data for ligand and protein. Both search types return links to DrugDomain data pages, which provide key ligand information, including its chemical classification, and list PDB structures and/or AlphaFold models known or predicted to bind the ligand. The list of the structures includes PDB/AF accession, downloadable PyMOL [29] script, which shows ECOD domains and residues interacting with the ligands; a list of ECOD domains interacting with the molecule with links to the ECOD database, names of corresponding ECOD X-groups (possible homology level) and 2D diagrams of ligand–protein interactions (LigPlots). DrugDomain data webpage also includes a link to a list of drug-binding-associated post-translational modifications (PTMs) where available [14]. This list contains information about each PTM and links to PyMOL sessions with models of modified proteins generated by AlphaFold3, RoseTTAFold All-Atom or Chai-1. PyMOL sessions include PTM-modified residues, the ligand, and mapped ECOD domains, each shown in different colors.
The taxonomic distribution of proteins reported in the DrugDomain database v2.0 revealed the prevalence of eukaryotic and bacterial proteins (Fig. 2A). Pseudomonadota or proteobacteria are one of the most abundant phyla of Gram-negative bacteria, which are naturally found as pathogenic and free-living genera [30]. Thus, proteins from these bacteria are important targets for antibacterial therapy against human pathogens, and PDB entries of these proteins bound to various antibiotics comprise a significant fraction of the Protein Data Bank. Bacteria belonging to the phylum Bacillota can make up 11–95 % of the human gut microbiome [31] and play key roles in energy extraction. They have also been associated with the development of diabetes and obesity [32], making them potential therapeutic targets. Finally, the third-largest phylum in terms of the number of PDB structures with ligands is Actinomycetota (or Actinobacteria). These bacteria are major contributors to the biological buffering of soils and the source of many antibiotics [33]. Similarly, there are three largest eukaryotic phyla: Chordata includes humans and various model organisms such as mice and rats; Ascomycota is the largest phylum of fungi, which are the source of antibiotics like penicillin, and particular species are used to produce immunosuppressants and other medicinal compounds [34]; Streptophyta phylum includes green algae and the land plants.
Fig. 2.
DrugDomain v2.0 statistics. (A) Taxonomic distribution of proteins reported in the DrugDomain database, by UniProt population. The inside pie shows the distribution of super kingdoms, and the outside donut shows the distribution of phyla. (B) Distribution of ECOD domains from experimentally determined PDB structures, interacting with ligand, stratified by architecture (inside pie) and homologous group (outside donut).
The distribution of ECOD domains from experimental structures interacting with ligands is shown in Fig. 2B. The top three largest ECOD A-groups include α/β three-layered sandwiches, α+ β two layers and α+ β complex topology. The α/β three-layered sandwich architecture is represented mainly by Rossmann-like proteins. In our earlier work, we showed that these proteins perform diverse functions and interact with most superclasses of organic molecules [35], [36]. Most small molecules that interact with domains of the α+ β complex topology target protein kinases, which are among the most druggable proteins in the human proteome; therefore, their structures are abundant in the Protein Data Bank [37], [38]. The α+ β two-layer architecture includes heat shock proteins (HSP), which play a critical role as molecular chaperones and are important targets for anticancer chemotherapy [39].
Analysis of domains from experimentally determined PDB structures and the ClassyFire superclasses of the organic compounds they interact with revealed the three most common superclasses [22] in Protein Data Bank: Organoheterocyclic compounds, Organic oxygen compounds, Organic acids and derivatives (Fig. 3). The largest fraction of domains interacting with compounds from the majority of superclasses belongs to α/β three-layered sandwiches, α+ β two layers and α+ β complex topology ECOD architecture types, which were discussed above. The superclass Organoheterocyclic compounds includes atorvastatin, a lipid-lowering drug that reduces the risk of myocardial infarction, stroke, and other cardiovascular diseases [40]. Erythromycin is a broad-spectrum antibiotic in the Organic oxygen compounds superclass and is widely used to treat infections caused by both Gram-positive and Gram-negative bacteria [41]. Finally, Arbaclofen – a member of the Organic acids and derivatives superclass - is a drug that is used in the treatment of autism [42].
Fig. 3.
ECOD A-groups (left column) of experimental PDB structures and superclasses of organic molecules according to ClassyFire classification (right column). Each superclass and the lines pointed toward it are denoted by separate color. The thickness of the lines shows the number of PDB ligands interacting with domains from ECOD A-groups.
3.2. Number of domains mediating ligand interactions in Protein Data Bank
Protein domains are conserved structural units that serve as the fundamental evolutionary and architectural building blocks of proteins. Understanding how ligands bind – specifically, which domains are involved and how many mediate the interaction – is crucial for uncovering protein function and guiding drug discovery. Overall ligand-interacting statistics were calculated for each protein, based on the number of interacting ECOD domains associated with its UniProt accession (Fig. 4). Our results revealed that the majority of proteins with assigned ECOD domains bound ligands using one or two domains. Our observation is consistent with previous research, which indicates that most drug targets bind via a limited set of prevalent domains [43]. Moreover, it is noteworthy that, under the ECOD classification, protein kinases – the most druggable targets in the human proteome – are characterized by a single structural domain [38]. This contributes to their significant representation among proteins with one ligand-interacting domain. In contrast, other structural classifications divide these proteins into two domains [4]. It is important to note, however, that experimentally determined PDB structures may not always accurately reflect ligand coordination, as only a part of the protein is often included in the experimental structure.
Fig. 4.
Ligand-interacting statistics by number of domains per UniProt accession in Protein Data Bank. The left column shows the number of ligand-interacting domains, the right column shows the superclasses of organic molecules according to ClassyFire classification. The thickness of the lines indicates the number of UniProt accessions.
Our analysis of ligand-interacting statistics indicated that proteins deposited in the Protein Data Bank contain a range of one to ten ECOD domains involved in ligand interaction (Fig. 4). Such a large number of interacting domains (ten) can bind a single ligand when the protein forms a channel or pore structure. For example, human mitochondrial RNA splicing 2 (Mrs2) channel (Fig. 5A-C) enables Mg²⁺ permeation across the inner mitochondrial membrane and is crucial for mitochondrial metabolic function [44], illustrating how a channel structure can accommodate interactions with multiple domains. Dysregulated Mg²⁺ levels in humans are implicated in various diseases [45], as mitochondria are the primary site of ATP production in eukaryotic cells – a process critically dependent on Mg²⁺ as a cofactor. The cation also commonly forms complexes with cellular nucleotides [46]. Mrs2 exists as homopentamers, with each monomer featuring two C-terminal transmembrane helices [46]. Structurally, each monomer contains two ECOD domains: an N-terminal “CorA soluble domain-like” domain and a C-terminal transmembrane domain (Fig. 5C). Mg²⁺ is coordinated near the borders of two domains of each monomer and interacts with each domain of homopentamer (Fig. 5B).
Fig. 5.
Structure of the human mitochondrial Mrs2 channel (PDB: 8IP5). (A) Channel view of Mrs2 with protein colored by ECOD domains, Mg2 + ion is shown in green, and sticks show interacting residues. (B) Close-up channel view of Mrs2. (C) Side view of Mrs2 showing three out of five monomers. Chains C, D, and E are colored by ECOD domains.
The DrugDomain database allows users to explore all known interactions of a given ligand with all known targets. For example, ATP – one of the most prevalent biological ligands – interacts with 1035 proteins (https://drugdomain.cs.ucf.edu/molecules/pdb/ATP.html - counted by UniProt accession) and may be coordinated by structurally unrelated (non-homologous) domains (Fig. 6A-D). For example, ubiquitin-like modifier-activating enzyme Atg7 activates two ubiquitin-like proteins, Atg8 and Atg12, and plays a crucial role in autophagy [47]. Fig. 6A shows Atg7 (orange), represented by a domain from the Rossmann-related ECOD H-group, bound to Atg8 (blue), represented by the Ubiquitin-Related H-group (beta-Grasp X-group). Atg7 takes part in adenylation of the C-terminal Gly residue of ubiquitin-like proteins, and this step consumes ATP [47]. In Fig. 6B, Cobalamin adenosyltransferase (ATR) is shown as two chains (orange and brown). Each chain comprises a single, almost exclusively α-helical domain that belongs to the Cobalamin adenosyltransferase H-group. ATR catalyzes the adenosylation of cob(I)alamin by ATP, which leads to cobalt−carbon bond formation and the synthesis of coenzyme B12 [48]. In Fig. 6C cytoplasmic part of ATP-binding cassette transporter ABCG2 is shown. ABCG2 is a transporter localized to the plasma membrane of cells across multiple tissues and physiological barriers. It mediates translocation of endogenous substrates, modulates the pharmacokinetics of numerous therapeutics, and confers protection against a wide spectrum of xenobiotics, including anticancer drugs [49]. This process is powered by ATP. ATP-binding domains of this protein (grey and cyan) belong to P-loop domains-related H-group and contain the canonical P-loop sequence motif that coordinates ATP molecule. Finally, Fig. 6D shows the ATP phosphoribosyltransferase that forms a homodimer of domains belonging to the Periplasmic binding protein-like II H-group. This protein catalyses the first step of histidine biosynthesis in plants and microorganisms. This is an energetically expensive process requiring 41 ATP equivalents for the synthesis of one histidine molecule [50].
Fig. 6.
Examples of ATP binding to different proteins. (A) Ubiquitin-like modifier-activating enzyme Atg7 bound to Atg8 (PDB: 3VH4). (B) Cobalamin adenosyltransferase MMAB (PDB: 6D5K). (C) ATP-binding cassette transporter ABCG2 (PDB: 6HZM) (D) ATP phosphoribosyltransferase (PDB: 5UBH). All proteins are colored by their ECOD domains. ATP is depicted with sticks and colored by its constituent elements. Residues interacting with ATP are colored in magenta.
4. Conclusions
The DrugDomain database version 2.0 represents a comprehensive resource depicting interactions between structural protein domains and small organic (including drugs) and inorganic molecules, and – unlike previous versions – covers the entire Protein Data Bank. It also reports domain-drug interactions for AlphaFold models of human drug targets lacking experimental structures. Additionally, it features over 6000 small-molecule binding-associated PTMs and more than 14,000 PTM-modified human protein models with docked ligands, generated by state-of-the-art AI-based approaches. DrugDomain database v2.0 includes 43,023 unique UniProt accessions (more than 16-fold increase relative to v1.0), 174,545 PDB structures, 37,367 ligands from PDB, and 7561 DrugBank molecules. Within experimental PDB structures, the distribution of ECOD domains interacting with ligands was analyzed. This analysis revealed that the top three ECOD A-groups, ranked by the number of ligand-interacting domains, are predominantly α/β three-layered sandwiches (Rossmann fold), α+ β two layers (heat shock proteins), and α+ β complex topology (kinases). The distribution of domains in experimental PDB structures and their interacting compound superclasses identified the top three categories as Organoheterocyclic compounds, Organic oxygen compounds, and Organic acids and derivatives. Our analysis showed that proteins in the Protein Data Bank exhibit a range of one to ten ECOD domains involved in ligand interaction. All data and protein models are available for viewing and downloading in the DrugDomain database (https://drugdomain.cs.ucf.edu/) and GitHub (https://github.com/kirmedvedev/DrugDomain).
CRediT authorship contribution statement
R. Dustin Schaeffer: Writing – review & editing, Funding acquisition. Kirill E. Medvedev: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Methodology, Formal analysis, Data curation, Conceptualization, Funding acquisition. Nick V. Grishin: Resources, Funding acquisition.
Author statement
All authors have read the revised manuscript and agree to submit it.
Funding
The study is supported by The University of Central Florida College of Engineering and Computer Science (to K.E.M.), grants from the National Institute of General Medical Sciences of the National Institutes of Health GM127390 (to N.V.G.), GM147367 (to R.D.S), the Welch Foundation I-1505 (to N.V.G.), the National Science Foundation DBI 2224128 (to N.V.G.).
Acknowledgements
The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin (https://tacc.utexas.edu/) for providing computational resources that have contributed to the research results reported within this paper.
References
- 1.Grishin N.V. Fold change in evolution of protein structures. J Struct Biol. 2001;134(2-3):167–185. doi: 10.1006/jsbi.2001.4335. [DOI] [PubMed] [Google Scholar]
- 2.Bashton M., Chothia C. The generation of new protein functions by the combination of domains. Structure. 2007;15(1):85–99. doi: 10.1016/j.str.2006.11.009. [DOI] [PubMed] [Google Scholar]
- 3.Andreeva A., Kulesha E., Gough J., Murzin A.G. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020;48(D1):D376–D382. doi: 10.1093/nar/gkz1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Waman V.P., Bordin N., Alcraft R., Vickerstaff R., Rauer C., Chan Q., et al. CATH 2024: CATH-AlphaFlow doubles the number of structures in CATH and reveals nearly 200 new folds. J Mol Biol. 2024;436(17) doi: 10.1016/j.jmb.2024.168551. [DOI] [PubMed] [Google Scholar]
- 5.Schaeffer R.D., Medvedev K.E., Andreeva A., Chuguransky S.R., Pinto B.L., Zhang J., et al. ECOD: integrating classifications of protein domains from experimental and predicted structures. Nucleic Acids Res. 2025;53(D1):D411–D418. doi: 10.1093/nar/gkae1029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schaeffer R.D., Zhang J., Medvedev K.E., Kinch L.N., Cong Q., Grishin N.V. ECOD domain classification of 48 whole proteomes from AlphaFold structure database using DPAM2. PLoS Comput Biol. 2024;20(2) doi: 10.1371/journal.pcbi.1011586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Abramson J., Adler J., Dunger J., Evans R., Green T., Pritzel A., et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. doi: 10.1038/s41586-024-07487-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schaeffer R.D., Zhang J., Kinch L.N., Pei J., Cong Q., Grishin N.V. Classification of domains in predicted structures of the human proteome. Proc Natl Acad Sci USA. 2023;120(12) doi: 10.1073/pnas.2214069120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lau A.M., Bordin N., Kandathil S.M., Sillitoe I., Waman V.P., Wells J., et al. Exploring structural diversity across the protein universe with the encyclopedia of domains. Science. 2024;386(6721) doi: 10.1126/science.adq4946. [DOI] [PubMed] [Google Scholar]
- 10.Varadi M., Bertoni D., Magana P., Paramval U., Pidruchna I., Radhakrishnan M., et al. AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res. 2024;52(D1):D368. doi: 10.1093/nar/gkad1011. D368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Medvedev K.E., Schaeffer R.D., Chen K.S., Grishin N.V. Pan-cancer structurome reveals overrepresentation of beta sandwiches and underrepresentation of alpha helical domains. Sci Rep. 2023;13(1) doi: 10.1038/s41598-023-39273-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Akdel M., Pires D.E.V., Pardo E.P., Janes J., Zalevsky A.O., Meszaros B., et al. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol. 2022;29(11):1056–1067. doi: 10.1038/s41594-022-00849-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Medvedev K.E., Schaeffer R.D., Grishin N.V. DrugDomain: the evolutionary context of drugs and small molecules bound to domains. Protein Sci. 2024;33(8) doi: 10.1002/pro.5116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Medvedev K.E., Schaeffer R.D., Grishin N.V. Leveraging AI to explore structural contexts of post-translational modifications in drug binding. J Chemin. 2025;17(1):67. doi: 10.1186/s13321-025-01019-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., et al. The protein data bank. Nucleic Acids Res. 2000;28(1):235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Westbrook J.D., Shao C., Feng Z., Zhuravleva M., Velankar S., Young J. The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the protein data bank. Bioinformatics. 2015;31(8):1274–1278. doi: 10.1093/bioinformatics/btu789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Piehl D.W., Vallat B., Truong I., Morsy H., Bhatt R., Blaumann S., et al. rcsb-api: python toolkit for streamlining access to RCSB protein data bank APIs. J Mol Biol. 2025;437(15) doi: 10.1016/j.jmb.2025.168970. [DOI] [PubMed] [Google Scholar]
- 18.Knox C., Wilson M., Klinger C.M., Franklin M., Oler E., Wilson A., et al. DrugBank 6.0: the DrugBank knowledgebase for 2024. Nucleic Acids Res. 2024;52(D1):D1265–D1275. doi: 10.1093/nar/gkad976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Kim S., Chen J., Cheng T., Gindulyte A., He J., He S., et al. PubChem 2025 update. Nucleic Acids Res. 2025;53(D1):D1516–D1525. doi: 10.1093/nar/gkae1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zdrazil B., Felix E., Hunter F., Manners E.J., Blackshaw J., Corbett S., et al. The ChEMBL database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res. 2024;52(D1):D1180–D1192. doi: 10.1093/nar/gkad1004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu T., Hwang L., Burley S.K., Nitsche C.I., Southan C., Walters W.P., et al. BindingDB in 2024: a FAIR knowledgebase of protein-small molecule binding data. Nucleic Acids Res. 2025;53(D1):D1633–D1644. doi: 10.1093/nar/gkae1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Djoumbou Feunang Y., Eisner R., Knox C., Chepelev L., Hastings J., Owen G., et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J Chemin. 2016;8:61. doi: 10.1186/s13321-016-0174-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Laskowski R.A., Swindells M.B. LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J Chem Inf Model. 2011;51(10):2778–2786. doi: 10.1021/ci200227u. [DOI] [PubMed] [Google Scholar]
- 24.Cock P.J., Antao T., Chang J.T., Chapman B.A., Cox C.J., Dalke A., et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hekkelman M.L., de Vries I., Joosten R.P., Perrakis A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat Methods. 2023;20(2):205–213. doi: 10.1038/s41592-022-01685-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Krishna R., Wang J., Ahern W., Sturmfels P., Venkatesh P., Kalvet I., et al. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science. 2024;384(6693) doi: 10.1126/science.adl2528. [DOI] [PubMed] [Google Scholar]
- 27.Discovery C., Boitreaud J., Dent J., McPartlon M., Meier J., Reis V., et al. Chai-1: decoding the molecular interactions of life. bioRxiv2024. 2024 10.10.615955. [Google Scholar]
- 28.UniProt C. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Res. 2023;51(D1):D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.The PyMOL Molecular Graphics System . LLC. Accessed; January 2024. Version 3.0 Schrödinger. [Google Scholar]
- 30.Rizzatti G., Lopetuso L.R., Gibiino G., Binda C., Gasbarrini A. Proteobacteria: a common factor in human diseases. Biomed Res Int. 2017;2017 doi: 10.1155/2017/9351507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Magne F., Gotteland M., Gauthier L., Zazueta A., Pesoa S., Navarrete P., et al. The Firmicutes/Bacteroidetes ratio: a relevant marker of gut dysbiosis in obese patients? Nutrients. 2020;12(5) doi: 10.3390/nu12051474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ley R.E., Turnbaugh P.J., Klein S., Gordon J.I. Microbial ecology: human gut microbes associated with obesity. Nature. 2006;444(7122):1022–1023. doi: 10.1038/4441022a. [DOI] [PubMed] [Google Scholar]
- 33.Procopio R.E., Silva I.R., Martins M.K., Azevedo J.L., Araujo J.M. Antibiotics produced by streptomyces. Braz J Infect Dis. 2012;16(5):466–471. doi: 10.1016/j.bjid.2012.08.014. [DOI] [PubMed] [Google Scholar]
- 34.Luque C., Cepero A., Perazzoli G., Mesas C., Quinonero F., Cabeza L., et al. In vitro efficacy of extracts and isolated bioactive compounds from ascomycota fungi in the treatment of colorectal cancer: a systematic review. Pharm (Basel) 2022;16(1) doi: 10.3390/ph16010022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Medvedev K.E., Kinch L.N., Schaeffer R.D., Grishin N.V. Functional analysis of Rossmann-like domains reveals convergent evolution of topology and reaction pathways. PLoS Comput Biol. 2019;15(12) doi: 10.1371/journal.pcbi.1007569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Medvedev K.E., Kinch L.N., Dustin Schaeffer R., Pei J., Grishin N.V. A fifth of the protein world: Rossmann-like proteins as an evolutionarily successful structural unit. J Mol Biol. 2021;433(4) doi: 10.1016/j.jmb.2020.166788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Anderson B., Rosston P., Ong H.W., Hossain M.A., Davis-Gilbert Z.W., Drewry D.H. How many kinases are druggable? A review of our current understanding. Biochem J. 2023;480(16):1331–1363. doi: 10.1042/BCJ20220217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Medvedev K.E., Schaeffer R.D., Pei J., Grishin N.V. Pathogenic mutation hotspots in protein kinase domain structure. Protein Sci. 2023;32(9) doi: 10.1002/pro.4750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Zapf C.W., Bloom J.D., McBean J.L., Dushin R.G., Nittoli T., Otteng M., et al. Macrocyclic lactams as potent Hsp90 inhibitors with excellent tumor exposure and extended biomarker activity. Bioorg Med Chem Lett. 2011;21(11):3411–3416. doi: 10.1016/j.bmcl.2011.03.112. [DOI] [PubMed] [Google Scholar]
- 40.Grundy S.M., Stone N.J. 2018 American heart Association/American college of cardiology multisociety guideline on the management of blood cholesterol: primary prevention. JAMA Cardiol. 2019;4(5):488–489. doi: 10.1001/jamacardio.2019.0777. [DOI] [PubMed] [Google Scholar]
- 41.Schlunzen F., Zarivach R., Harms J., Bashan A., Tocilj A., Albrecht R., et al. Structural basis for the interaction of antibiotics with the peptidyl transferase centre in eubacteria. Nature. 2001;413(6858):814–821. doi: 10.1038/35101544. [DOI] [PubMed] [Google Scholar]
- 42.Huang Q., Pereira A.C., Velthuis H., Wong N.M.L., Ellis C.L., Ponteduro F.M., et al. GABA(B) receptor modulation of visual sensory processing in adults with and without autism spectrum disorder. Sci Transl Med. 2022;14(626) doi: 10.1126/scitranslmed.abg7859. [DOI] [PubMed] [Google Scholar]
- 43.Kruger F.A., Rostom R., Overington J.P. Mapping small molecule binding data to structural domains. BMC Bioinforma 13 Suppl. 2012;17((17) doi: 10.1186/1471-2105-13-S17-S11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Li M., Li Y., Lu Y., Li J., Lu X., Ren Y., et al. Molecular basis of Mg(2+) permeation through the human mitochondrial Mrs2 channel. Nat Commun. 2023;14(1):4713. doi: 10.1038/s41467-023-40516-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Auwercx J., Rybarczyk P., Kischel P., Dhennin-Duthille I., Chatelain D., Sevestre H., et al. Mg(2+) transporters in digestive cancers. Nutrients. 2021;13(1) doi: 10.3390/nu13010210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Li P., Liu S., Wallerstein J., Villones R.L.E., Huang P., Lindkvist-Petersson K., et al. Closed and open structures of the eukaryotic magnesium channel Mrs2 reveal the auto-ligand-gating regulation mechanism. Nat Struct Mol Biol. 2025;32(3):491–501. doi: 10.1038/s41594-024-01432-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Noda N.N., Satoo K., Fujioka Y., Kumeta H., Ogura K., Nakatogawa H., et al. Structural basis of Atg8 activation by a homodimeric E1, Atg7. Mol Cell. 2011;44(3):462–475. doi: 10.1016/j.molcel.2011.08.035. [DOI] [PubMed] [Google Scholar]
- 48.Campanello G.C., Ruetz M., Dodge G.J., Gouda H., Gupta A., Twahir U.T., et al. Sacrificial Cobalt-Carbon bond homolysis in coenzyme B(12) as a cofactor conservation strategy. J Am Chem Soc. 2018;140(41):13205–13208. doi: 10.1021/jacs.8b08659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Manolaridis I., Jackson S.M., Taylor N.M.I., Kowal J., Stahlberg H., Locher K.P. Cryo-EM structures of a human ABCG2 mutant trapped in ATP-bound and substrate-bound states. Nature. 2018;563(7731):426–430. doi: 10.1038/s41586-018-0680-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Mittelstadt G., Jiao W., Livingstone E.K., Moggre G.J., Nazmi A.R., Parker E.J. A dimeric catalytic core relates the short and long forms of ATP-phosphoribosyltransferase. Biochem J. 2018;475(1):247–260. doi: 10.1042/BCJ20170762. [DOI] [PubMed] [Google Scholar]







