Abstract
Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.
Index Terms: Artificial intelligence, drug discovery and development, target identification, drug-target interaction, de novo drug design, post-market drug assessment
I. Introduction
The process of developing new medicine for the market is a costly and time-consuming endeavor, with expenses of about $2 billion and an average timeline of up to 12 years [1]. Despite this substantial investment, the success rate of new drugs approved by the US Food and Drug Administration (FDA) remains remarkably low, with only about 6.2% of drugs identified during the discovery phase ultimately reaching patients [2], [3]. This high failure rate is often attributed to a lack of clinical effectiveness, which is frequently traced back to challenges in identifying and validating appropriate targets. Additionally, finding pharmacological agents that can effectively perturb these targets while minimizing off-target effects presents another significant challenge.
The exponential progress in molecular biology and biotechnology, resulting in the generation of vast datasets, coupled with recent breakthroughs in developing advanced predictive and generative computational methods capable of effectively handling extensive multidimensional data, has dramatically transformed and accelerated all stages of the DDD pipeline. These advancements have empowered researchers to detect disrupted biomolecular pathways, identify and prioritize suitable therapeutic targets [4], [5], [6], predict DTI [7], [8], [9], design new molecular structures with specific properties [10], and assess the clinical efficacy of new medications [11]. This enables pharmaceutical companies to make more informed decisions throughout the DDD process and improve the efficacy and speed of developing new medications. One notable example is the discovery of the new antibiotic, abaucin, against the multi-drug resistant pathogen Acinetobacter baumannii [12]. In this study, Liu et al. screened a dataset containing 7500 molecules for A. baumannii growth inhibitors. These inhibitors were then used to train a neural network model that can predict a structurally novel antibiotic, resulting in the discovery of abaucin. Another noteworthy instance involves using generative methods integrated with reinforcement learning (RL) to discover and design the drug INS018–055, intended to treat Idiopathic Pulmonary Fibrosis [13]. Notably, this drug is the first to feature both a novel AI-discovered target and a unique AI-generated design [14]. Furthermore, an increasing number of AI-driven drugs are progressing into clinical trials, reflecting the growing adoption of these methodologies [14].
Other aspects of the DDD pipeline, such as post-market assessment of drug safety and efficacy across diverse populations over time, can also benefit from AI-based approaches, especially where traditional methods struggle with data complexity and establishing personalized dosing and administration regimens. Additionally, advances in precision medicine through AI-enabled personalized dosing and administration have the potential to significantly enhance patient care and outcomes. Consequently, integrating AI technologies signifies a major shift toward more efficient and cost-effective methodologies in the drug discovery landscape, paving the way for precision medicine that more precisely tailors treatments to individual patient characteristics.
The current survey aims to offer a comprehensive review of computational approaches involved in four critical steps of early drug discovery and post-market stages and is organized into the following sections: i) drug target identification; ii) predicting DTI and its application in drug discovery; iii) de novo drug design; and iv) post-market drug assessment approaches (Fig. 1).
Fig. 1.
AI applications along four critical steps of the DDD process discussed in this paper. These critical steps include target identification, predicting interactions between drugs and targets, designing novel compounds, and post-market drug efficacy and toxicity monitoring.
II. Previous Review Studies
In recent years, the significance of AI applications in DDD has led to a multitude of published review articles, highlighting the application of various computational approaches in different aspects of the field [3], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29]. For instance, Qureshi et al. explored the utilization of AI in data representation and prediction at various stages of drug design [30]. Additionally, they provided insights into open-source software tools and databases that support this process. Moingeon et al. provided a concise overview of AI applications in generating disease models, identifying and validating potential therapeutic targets, developing and optimizing drug candidates with a focus on deep-learning methods, and evaluating the clinical efficacy of designed drug candidates [3]. Hasan et al. categorized computer-aided drug design (CADD) tools into two main groups: structure-based and ligand-based methods [15]. Their review was primarily focused on Molecular Docking, Molecular Dynamics (MD), Pharmacophore Modeling, and quantitative structure-activity relationship (QSAR). Similarly, Bassani et al. divided CADD techniques into ligand-based and structure-based methods, with a notable emphasis on MD [16]. Sabe et al.’s review centered around virtual screening methods, covering available databases and software programs in this area [17]. Bagherian et al. conducted a review on the utilization of machine learning (ML) based techniques that are utilized for DTI prediction [9]. Sessa et al. performed a systematic review of AI in pharmacoepidemiology, discussing the different post-market analyses that can be performed to ensure drug safety and efficacy in the general population [26], [31].
III. Drug Target Identification
Identifying crucial disease-related genes and precisely determining suitable drug targets continues to stand as one of the most important decisions and investments that companies make during the DDD process. Various experimental techniques, such as genome-wide association studies, as well as the collection and examination of large-scale Omics data, can be employed to discern disease-related genes and identify potential targets. Nonetheless, the drawback of employing these high-throughput techniques frequently resides in the production of extensive lists of potential genes, which in turn demands validation experiments, that make these approaches both time-consuming and financially burdensome. To tackle this, several computational methods have arisen to refine the gene sets under consideration, carry gene prioritization and identify drug targets. In this paper, we categorize and elaborate on these methods within two distinct sections: network-based methods and deep learning (DL)-based approaches.
A. Network-Based Methods
Network-based approaches harness graph theory to integrate high-throughput biological data into networks, thereby establishing the biological properties of each entity through its interactions with other components within the constructed network [32], [33]. From a mathematical perspective, every biological network can be conceptualized as a graph, , where nodes or vertices depict separate biological entities such as proteins, genes, or metabolites, and edges , depict various molecular interactions such as physical protein-protein interaction, gene regulation, or biochemical reactions.
Since the suggestion of using biological networks in systems biology in 2007 [34], network-based methods have been widely adopted to gain a comprehensive understanding of the interconnections between various biological entities and disease mechanisms at the subcellular level, facilitate gene ranking, and identify drug targets. These methods encompass a range of techniques, including but not limited to module detection, node centrality, and co-location. These techniques will be briefly elaborated upon in the subsequent sections.
1). Module Detection:
Biological networks exhibit striking clustering among nodes that bear distinct biological functions [35]. In this context, module detection algorithms pinpoint clusters of disease-associated genes within intricate networks [36], [37], [38], [39]. This strategy involves identifying sub-graphs with minimum cost or maximum density, based on the network’s underlying topology.
While these tools are capable of detecting functional modules based on network topology, the identified module may differ when applied to the same disease network with a slightly altered topology [40], [41]. Moreover, there are instances where capturing the disease module could pose a challenge. One possible explanation is that disease-related proteins do not form densely interlinked subgraphs. Instead, they tend to aggregate within specific network regions. To address this, community detection methods that analyze network properties can be used [42], [43].
2). Node Centrality:
Node centrality metrics evaluate node significance and are commonly used to pinpoint important nodes with crucial functions within a biological module. The most fundamental node attribute is node degree or degree centrality, denoting the count of edges that link the node to other nodes within the network. Other centrality metrics provide additional layers of insight into node significance within a network. For instance, coreness centrality represents a higher-level variant of centrality that takes into account both the node degree and its positioning within a network. A node possessing a higher coreness value signifies that it holds a more central position, exerting greater influence on network propagation compared to nodes with a higher degree but lower coreness [44], [45], [46]. In contrast, betweenness centrality (Bc) characterizes node centrality by measuring the probability that the shortest path between two randomly selected nodes traverses that particular node. This centrality measure pinpoints the gatekeepers that regulate node communication within the network and has demonstrated significant importance [47], [48], [49]. Another important centrality metric is Eigenvector centrality, which takes into account the number of edges, node positioning, and the influence a node has on its neighbors [50].
Node-based centrality often fails to identify the effects which appear over multiple edges in the network. There have been a few attempts to define extensive centrality measures over small subnetworks, for example, in [51], a centrality measure is defined over subgraphs which is effective in identifying genes connected with disease and immune response in Yeast protein-protein interaction (PPI)s.
3). Co-Location:
Since biologically meaningful characteristics are encoded in the network, it is natural that genes participating in the same disease process are close together in the network. Several methods have been proposed for these co-located groups, based on different concepts of distance in the network such as shortest path, diffusion distance, and communicability.
The shortest path between two nodes and is the smallest number of steps across the network to traverse from to . This topological measure has found extensive application in identifying regulatory pathways within biological networks and uncovering key targets [52]. However, this measure considers only one path (the shortest) between genes and ignores other paths. In complex diseases like cancer, the prospective genes don’t always align with the identified shortest pathway. Diffusion-based approaches, such as random walk and diffusion kernel, consider all paths based on the concept of random exploration of the network and have been employed to reveal novel genes associated with diseases and potential candidates for drug targeting within the interactome [53]. Additionally, communicability explores multiple paths through a generalized exponential function of the network adjacency [54].
4). Integrated Network Analysis:
Network-based approaches are often used to analyze the relationship between individual molecular components such as gene expression, protein-protein interaction, or metabolomics data. However, understanding the molecular mechanisms of complex disorders such as cancer, metabolic disorders, and autoimmune disorders, and identifying appropriate drug targets, requires the integration of multiple layers of information from several molecules, cell types, and organs [55], [56], [57], [58]. In a recent study [57] weighted gene co-expression network analysis (WGCNA) was used to integrate various modalities from pancreatic tissue imaging, sorted and islet cell transcriptomics and islet functional analysis to identify early disease-driving events in type2 diabetes (T2D). This study identified RFX6 as a key hub transcription factor that was down regulated in T2D cells and linked to diminished glucose-stimulated insulin secretion. Targeted perturbation of RFX6 in primary islet cells changed cell chromatin structure in areas linked to T2D GWAS signals. Furthermore, large scale population genetic analyses showed that lower predicted RFX6 expression is causally related to higher T2D risk.
Integrative networks can also be used in categorizing diseases that present multiple subtypes with similar pathological or physiological outcomes. Zhou et al. [59] used this method to integrate molecular networks and phenotypic profiles of various disease entities and reclassify 1797 distinct ICD (International Classification of Diseases) codes. Using this approach, diseases exhibiting significant molecular diversity are reclassified into multiple disease chapters and subcategories, allowing for a more accurate representation. This hold the potential to enhance the accuracy of disease taxonomy and subsequently enable more targeted therapeutic approaches.
B. Deep Learning-Based Methods
In recent years, ML techniques, particularly advanced DL methods such as recurrent neural networks (RNNs), graph neural networks (GNNs), generative adversarial networks (GANs), variational autoencoders (VAEs), and transfer learning approaches, have garnered significant interest and demonstrated remarkable success in the field of pharmaceutical research. They have been applied effectively in domains such as the creation of novel small molecules and the prediction of potential drug targets. The subsequent section, “de novo drug design”, will delve extensively into the architecture of these models. In this section, we will primarily focus on the application of DL methods in the domain of target identification.
Advanced DL-based methods, particularly graph-based models, have gained significant interest and demonstrated remarkable success in identifying promising drug targets related to cancer [6], [60], [61], [62], aging [5], idiopathic pulmonary fibrosis [13], and other complex biological processes. These methods can take the biological network as input and handle intricate molecular data on nodes and edges to analyze the biological networks and discover and prioritize potential drug targets. By utilizing graph embedding, these methods extract network features into a low-dimensional vector representation, preserving both the network’s topology and the content information of individual nodes [63].
For instance, Zheng et al. [60] used Graph Attention-Network analysis to identify disease-associated Piwi-Interacting RNAs (piRNAs) as critical determinants in predicting cancer outcomes. In this study, attention-based GNN was used to compute concealed representation of each node by focusing on its neighbors. Through a learning process, distinct weights were assigned to different nodes to aid decision-making. This method seamlessly integrates piRNA sequence information with disease semantic information to enhance the prediction accuracy. The results indicate that piRNAs that are positioned closer to tumor genes within the network are more likely to serve as viable therapeutic targets for cancer.
Zhang et al. [6] proposed a novel approach named “heterophilic graph diffusion convolutional networks” (HGDCs) to improve cancer-driver gene identification. In this method, graph diffusion is first used to create an auxiliary network that identifies structurally similar nodes within a biological network. Message aggregation and propagation frameworks are then applied to capture the heterophilic properties of the network, addressing the problem of neighboring dissimilar genes in smoothing out the features of driver genes. Finally, a layer-wise attention classifier is employed to predict the likelihood of a gene being a cancer driver. The experimental findings of this work confirmed the efficacy of the HGDC model in identifying previously known driver genes across diverse networks. Additionally, this approach could identify and prioritize novel patient-specific cancer driver genes that work together with known driver genes to promote tumorigenesis.
In another research, Zhao et al. [61] developed a graph attention network (GAT)-based approach to identify cancer driver genes by integrating multi-omics data with multi-dimensional gene networks. This methodology comprised three steps. First, multiple gene association maps based on various types of data, including protein-protein interactions (PPIs), gene co-expression patterns, gene sequence similarity, gene ontology annotation information, and KEGG pathway co-occurrence, were established. In the second step, a comprehensive multi-dimensional gene network was constructed, containing approximately 20,000 genes and incorporating five types of gene associations. Finally, a GAT block was applied to obtain dimension-specific gene representations. A joint learning module was then integrated, allowing for the dynamic learning of each dimensional representation’s significance and combining them to create a generalized gene representation.
Song et al. [62] developed a platform to detect cancer diver mutations at allosteric sites that are associated with protein dynamics, structure, and energy communication and affect protein function distant from the functional region. This platform distinguishes allosteric driver mutations from passenger mutations by employing an equivariant multi-head attention-weighted graph neural network (EGNN) developed for efficiently handling molecular graph data structure. In this study, the atoms of neighboring amino acids around the variant as protein context are considered to effectively capture the protein topology.
Another noteworthy example is iPANDA [64], an innovative signaling pathway modeling platform that applies advanced AI and bioinformatics techniques to identify drug targets for specific diseases. This platform utilizes a combination of text-based AI scores, omics AI scores, and finance scores to prioritize protein targets based on key properties such as druggability, safety, and commercial tractability.
IV. Predicting Drug-Target Interaction and its Applications in Early DDD
Using computational methods to predict the interaction between drugs and targets is a pivotal step in the DDD process for several important reasons including:
Hit identification and optimization [65].
Early prediction of drug off-target activity and undesired side effects during the initial phases of new drug discovery [65], [66].
Facilitating the exploration of approved compounds for the treatment of diseases beyond their original intended uses, a process known as drug repurposing or repositioning [9], [66], [67].
Identifying chemical components that interact with a specific set of therapeutically related targets, which is known as polypharmacology [66], [68].
Numerous in silico recommender systems have been developed so far to tackle different facets of DTI. These systems can be categorized into four primary groups: structure-based methods, which analyze complementarity between known target and compound structures; ML-based methods, which use the features of targets and drugs as well as known DTIs as training data to construct a predictive model; network-based methods which use target and drug similarities along with known DTI network and implement graph theory algorithms to detect new DTIs; and hybrid methods which leverage a combination of multiple approaches for DTI prediction.
A. Structure-Based Methods
The most prevalent structure-based method extensively used in predicting DTI is molecular docking [69]. This approach integrates the 3D structures of drug molecules and targets first to predict the molecular positioning of a ligand within the receptor’s binding pocket and then estimate their binding energy using a scoring function [70], [71]. How docking aids various tasks in drug discovery has evolved over the years [69]. Initially developed and employed as a standalone method, docking is now integrated with other computational approaches in a cohesive workflow to overcome its primary limitations and enhance prediction accuracy. For instance, the lack of a 3D structure can be bypassed by utilizing homology models such as AlphaFold 2 [72], which can reliably predict the 3D structure of protein targets based on structural templates with closely related sequences and significantly enhance the performance of small-molecule docking. Furthermore, molecular dynamics (MD) can be used as a pre-docking screening method to identify and select receptor conformations for docking and to evaluate the stability of the predicted complexes [73]. This approach is especially useful for flexible or poorly characterized targets with limited crystallographic data. Post-docking tools, such as binding free energy prediction methods, can also enhance pose re-scoring, pose refinement, and the assessment of ligand-target complex stability [74], [75].
Recently, ML and DL methods have been used for docking prediction and identifying functional groups that are responsible for binding [76], [77], [78], [79], [80]. For instance, Zhang et al. [8] developed a novel DL-based method, KarmaDock, which incorporate functions of binding pose generation, correction, and strength estimation in an integrated framework. In this approach, the encoders for the ligand and protein learn the characteristics of intramolecular interactions. subsequently, equivariant GNN incorporating self-attention is applied to modify the ligand pose based on intramolecular interactions. Then post-processing is used to ensure the generation of chemically plausible structures. Finally, a mixture density network is employed to score the binding strength. Another noteworthy example is the AlphaFold 3 [81] which demonstrates enhanced performance in small-molecule docking and can efficiently predict the co-folding structures of complexes that include nucleic acids, proteins, small molecules, and modified residues.
The above-mentioned methods generally treat proteins as static structures. However, molecular dynamics simulations of protein-ligand interactions can reveal important protein conformations essential for understanding protein function and advancing drug discovery. In this context, Dong et al. [7] proposed FlexPose, a geometric neural network designed for the direct and flexible modeling of protein-ligand complex structures in Euclidean space, bypassing traditional sampling and scoring methods. In another approach, Lu et al. [82] developed DynamicBind, a DL-based model that utilizes equivariant geometric diffusion networks to create a smooth energy landscape, thereby enabling efficient transitions between various equilibrium states. These advancements have the potential to accelerate drug discovery by facilitating the development of small molecules for previously undruggable targets.
B. Machine Learning-Based Methods
Though docking simulation is commonly used for DTI prediction during the DDD process, its time-consuming and computationally intensive nature makes it less suitable for large-scale virtual screening [66], [69]. To address this issue numerous ML-based methods have been proposed to systematically mine the entire chemical space for potential interactions within the biological space of targets. These techniques do not require the 3D structure of the target and can be classified into similarity/distance-based, feature-based, DL-based, and matrix factorization-based algorithms in which, data concerning targets, drugs, and previously confirmed DTIs is converted into features used to train a predictive model. Subsequently, the trained model is used to anticipate new DTI interactions.
1). Similarity/Distance-Based Models:
In this approach, target-target or drug-drug similarity measures are integrated through similarity or distance functions to conduct the prediction. These functions can be delineated by considering sequence homology among proteins, the pharmacological similarity between drugs, and the network topology involving existing protein targets and drugs. Alternatively, a distance function (e.g., nearest neighbor algorithms) can be used to ascertain the closeness of a new drug relative to known pairs [83], [84]. The main disadvantage of these methods is that they are negatively influenced by the limited availability of known drug interactions. Furthermore, some available data sets are binary, even though target-drug binding affinities are inherently continuous.
2). Feature-Based Methods:
This category includes kernel-based models such as tree-based algorithms and support vector machines which involve mapping diverse target-target and drug-drug similarity matrices to DTI labels. These methods represent each drug-target pair as a feature vector, typically with binary annotations. An ML algorithm is then utilized to classify each vector into negative or positive interactions [85], [86], [87], [88].
3). Deep Learning Models:
Several DL models have been suggested to address the challenges posed by noisy and high-dimensional data in predicting DTIs [89], [90], [91], [92]. These methods involve two key steps: first, generating feature vectors, and then applying the DL algorithm to known DTIs. Typically, three types of properties including physico-chemical, topological, and biological information of targets and drugs can be used to create feature vectors or matrices for this approach. Over the past few years, convolutional neural networks (CNNs) [93], GNNs [94], [95], graph convolution and attention mechanism [92], and multiple layers perception [96], [97] have been used to perform DTI prediction. For instance, Luo et al. [94] proposed, KDBNet, a novel algorithm, that leverages 3D molecule and protein structure information to predict binding affinities. This model employs GNN to adapt structure representations of drug molecules and protein binding pockets, capturing the spatial characteristics and geometry of DTIs. In a recent study, Koh et al. [95] developed PSICHIC, an innovative GNN-based framework that integrates physicochemical constraints to derive interaction fingerprints directly from sequence data. PSICHIC’s interpretable fingerprints can identify ligand atoms and protein residues involved in protein-ligand interactions. While the methods mentioned in this section exhibit strong performance, they have certain limitations, such as demanding substantial training data and significant computational resources to train complex models.
4). Matrix Factorization and Matrix Completion Methods:
These methods seek to factorize the interaction matrix as a product of two matrices and . Notably, the latent dimension is much smaller than the original dimensions and . These methods operate on the assumption that targets and drugs reside in the same distance space, allowing the measurement of interaction strength based on the distances between them. Consequently, drugs and targets can be embedded within a shared low-dimensional subspace [98], [99]. Though matrix completion and matrix factorization have been recognized as highly reliable approaches regarding DTI prediction efficacy [18], [100], they face limitations in incorporating all the available targets and drugs-related information. In a recent study, Bagherian et al. [9] developed two novel algorithms, “coupled matrix-matrix completion” (CMMC) and “coupled tensor-matrix completion” (CTMC), in which the initial matrix was expanded to include additional structural information available in various databases and integrate multiple types of scores for target-target relationships and drug-drug similarities. The assessment conducted on two benchmark datasets, including DrugBank and TTD, demonstrates that CMMC and CTMC exhibit superior performance compared to GRMF, -GRMF, NRLMF-, and NRLMF regarding sensitivity, specificity, area under the curve, and F1 score.
C. Network-Based Methods
Traditional network-based approaches for DTI prediction encompass the bipartite local model (BLM) [101], network-based inference (NBI) model [102], and path score model (PSM) [103], [104]. Recently, more advanced methods based on network embedding have been suggested [105], [106], [107], [108]. In these methods, the nodes are depicted by low-dimensional vectors which compared to the traditional approaches more accurately retain the topology information and structure of the networks. Nevertheless, these approaches typically show a high false-positive rate, which negatively affects their applicability. The primary cause of this issue is that these methods cannot fully utilize the node information from heterogeneous networks. Additionally, more advanced merging methods to improve feature vector extraction are required.
Very recently, more sophisticated approaches combining network-based approach with ML and DL methods have been developed to overcome these limitations and extract unique characteristics inherent in different types of entities within heterogeneous biological networks and predict potential DTIs [92], [93], [109], [110].
D. Hybrid Frameworks
Hybrid methods include various approaches that leverage a combination of feature-based, DL, matrix factorization, and network-based methods. This integration enhances the predictive algorithm’s capabilities by incorporating diverse sets of information. For instance, DL-based network analysis can integrate nonlinear biological information from multiple networks to predict New drug targets in the context of drug repurposing [111], [112].
DTI prediction methods can also be integrated with biological networks to enhance prediction accuracy. One recent example is using two complementary ML strategies for drug repositioning against COVID-19 that target SARS-CoV-2 and its cellular processes in the host. The initial approach utilized a matrix factorization algorithm to prioritize broad-spectrum antivirals. The second approach, based on network medicine, employed graph kernels to prioritize drugs based on their perturbation effects on a specific subnetwork of the human interactome critical for SARS-CoV-2 infection and replication [113].
In summary, a major challenge in current DTI prediction approaches, except for the matrix factorization and matrix completion methods that permit the use of sparse matrices, is treating all unknown DTI interactions in the training data as negative samples. It is crucial to distinguish true negatives from unknown interactions to enhance prediction effectiveness. Additionally, the imbalance in the ratios of negative to positive samples in most of the training data poses a challenge. Finally, enhancing data quantity and quality, especially in terms of accurate measurements of target-target and drug-drug similarities, is essential for further improving prediction accuracy.
V. De Novo Drug Design
By de novo drug design we generally refer to methods that generate new molecules not included in existing databases. In contrast, drug design via virtual screening consists of finding new drug candidates from existing datasets of molecules; some issues with this approach are the cost of exploring huge search spaces, the necessity to develop effective search algorithms (typically genetic algorithms), and the choice of the search space (which can introduce bias on the results). De novo drug design aims instead at designing a novel molecule that will interact with a specific biological target in the body to produce a therapeutic effect.
A. Molecule Featurization and Representation Learning Architectures
The first step in proposing a model for de novo drug design is to choose how to represent molecular data (e.g., strings, graphs,...), a.k.a. molecule featurization, and subsequently, which ML architecture to use for the chosen featurization. One recent and complete survey on structural representations of molecules for AI-driven drug design is [114]. Three major types of molecule featurization have been employed: linear representations as fingerprints (vectors) and Simplified Molecular Input Line Entry System or SMILES (strings), graph-based representations, 3D representations.
Molecular fingerprints are vectors encoding a variety of fixed molecular properties of different types, such as structural, chemical, physical, electrical, and topological, as well as the presence or absence of certain molecular substructures. These properties can be easily used on most ML methods for property prediction, but are less suitable for de novo drug design since a fingerprint does not uniquely determine a molecule. Additionally, the power of DL methods lies in their ability to automatically learn representations and features useful for the task at hand.
The SMILES format is a string obtained assigning a number to each atom in the molecule and traversing the molecule in that order. The SMILES format is not unique (for example, it depends on the atom numbering and on the algorithms to traverse the molecule), but it is unambiguous and very compact. However, one usually assigns a unique SMILES format for any given molecule, and variations are reduced by employing canonicalization techniques. In the case of macromolecules, the SMILES representation can be very long and/or complex; for this reason, one often employs a sequence representation using amino acids as atomic components.
There has been a trend shifting from SMILES to graph representations of molecules, for several reasons [115]. SMILES are not designed to capture molecular similarity, and molecules with similar chemical structures may be encoded into very different SMILES. A consequence is that generative models like VAEs cannot efficiently learn smooth molecular embeddings. Additionally, essential chemical properties such as molecule validity are easier to express on graphs. Another disadvantage of SMILES is that substrings do not necessarily represent valid molecules or molecular components. In contrast, in the case of a graph, it is easier to split it into subgraphs.
Molecules can easily be represented as a graph with nodes corresponding to atoms and edges corresponding to bonds. Additionally, feature vectors can be associated with nodes and/or edges, providing a rich representation of the molecule. Despite it not being a 3D representation, graphs can often encode several 3D features such as bond length as an edge feature or chirality as a node feature. Architectures such as GNNs constitute a very general framework capable of updating the features based on the graph structure. However, this representation does not scale well with the size of the molecule, and it becomes cumbersome in the case of macromolecules. In this instance, one can use a contact map instead; that is, a graph encoding the distance between any two aminoacid residue pairs. Another issue with graph representations is that they often depend on the chosen ordering of the atoms (for example in the case of the adjacency matrix), which requires ML methods to be permutation invariant or equivariant in order to extract features that are order-independent. An additional problem comes up when dealing with bonds that cannot be explained in terms of classical valence theory, such as bonds involving more than two atoms or molecules with a stochastic structure whose bonds are constantly formed and destroyed. Some solutions have been explored, such as using hypergraphs (graphs whose edges are sets involving possibly more than two atoms) in the former example and using weighted, directed graphs to represent stochastic bonds in the latter example; however, more work is necessary to establish the best solutions.
None of the previous methods can easily represent the 3D structure of the molecule. This is sometimes very relevant in case one is interested in the binding site of a molecule for a ligand, and since the drug-likeness of a macromolecule can depend on its 3D structure. 3D representations can capture additional properties of the molecule such as shape, relative orientations, and angles. The most common 3D representations have been point clouds, 3D grids using voxels, 3D surfaces represented by meshes, and 3D graphs. These representations can be processed in a similar way as images with CNNs, but also with GNNs. We should also point out that in the case of ligands and proteins it is possible to use different representations: the pocket of the protein could be represented by a 3D featurization, and the ligand by a graph or SMILES featurization. Some challenges though involve acquiring the precise structure, and also implementing ML architectures that take into account how the same molecule can have multiple 3D structures based on global rotation as well as rotations of its components. Finally, self-supervised representation methods have recently attracted attention after their success in Natural Language Processing: these models learn generalizable, meaningful molecular representations that can be used by ML models. Examples include the use of word2vec on knowledge graphs created from molecular functional groups [116], the use of transformers on SMILES [117], and the use of tranformers on graph representations [118] (where an additional node is added to include knowledge such as fingerprints).
It is important to observe that no single representation is better than all others for all datasets and applications; rather, each representation tends to capture different (even though often overlapping) properties of molecules. For this reason, it is perhaps not surprising how recent trends have moved towards multimodality, i.e., developing architectures that can combine heterogeneous molecular featurizations and properties [119], [120], [121], [122]. Another recent focus has been on interpretability, since understanding how the model decisions are formed is paramount to determining biologically relevant causal connections beyond mere statistical correlations. For example, [123] explores both multimodality and interpretability: the authors propose a multimodal attention-based convolutional encoder to predict anticancer sensitivity which combines SMILES sequences, gene expression profiles of tumors, and prior knowledge on intracellular interactions from protein–protein interaction networks. The interpretability is obtained by analyzing the attention weights. More recently, [124] introduces an explainable graph convolutional neural network architecture for small molecule activity prediction, where a saliency map is used to highlight molecular substructures relevant to activity.
B. Deep Learning Frameworks for De Novo Molecule Design
After choosing how to featurize molecules, one has to choose which generative framework to adopt. Each of these relies on different deep learning architectures, none of which is always better than the other ones for every application. In the following subsections, we provide a summary of the main frameworks and examples of how they have been used for de novo drug design. The literature is already too vast for us to be able to cover it completely. Our goal is to provide an overview of the various methodologies with references to representative or more recent articles.
For the interested readers, more extensive surveys specifically focused on this topic are: [125] for a general overview; [126] and [127] for graph neural networks; [128] for small molecules; [129] for protein design using their 3D structure. Before that, we will discuss here some general problems of molecule generation.
The generative process is built on two conflicting goals: generating realistic, valid molecules - hence somewhat similar to the training dataset - and generating novel molecules - hence somewhat different from the training dataset. Two fundamental ways in which molecules can be generated are one-shot and sequential methods: in the former, a new molecule is generated at once in a single step; in the latter, a molecule is progressively built through a sequence of steps (such as adding atoms in the case of SMILES representations, or adding atoms/edges in the case of graph representations).
Furthermore, one often would like the generated molecules to satisfy specific properties, from drug-likeness to solubility to feasibility of in silico synthetization. Guiding the generative process is often achieved by either optimizing certain properties or setting them as targets for the generated molecules, and this can be done during the training process or ex post, by understanding the distribution once the model has been trained. The way the generative process can be guided generally depends on the chosen deep learning architecture; however, there are also some architecture-agnostic general ways of guiding the process, such as Reinforcement Learning, which is discussed in a separate subsection.
We will also note that there are exceptions to this classification, such as [130] which generates molecules using a message-passing neural network and masked graph modeling.
1). Recurrent Neural Networks and Transformers:
RNNs have found various applications in de novo drug design due to their ability to model sequential data and capture sequential dependencies, in particular for generation of molecules in SMILES format. The recurrent nature of RNNs makes them suitable to handle sequential data, such as the SMILES language, and suggest one by one the characters of a novel SMILES sequence. By conditioning the RNN on specific features or properties, it can generate new molecules with desired characteristics, as in [131], [132]. However, RNNs are autoregressive generative models that can be used to sequentially generate any representation (despite introducing an arbitrary ordering of the atoms), and applications to graph representations also exist [134]. The more recent trend, following the lines of research in netural language processing, is to use transformers [133], which by adding self-attention mechanisms are better able to capture of long-range dependencies between elements of a sequence.
2). Flow-Based Models:
Flow-based models are generative models that learn the underlying data distribution by transforming a simple distribution (e.g., Gaussian) into a more complex one that resembles the data distribution. A flow is an invertible transformation mapping molecules to the latent space/learned distribution. By sampling from the learned distribution, flow-based models can propose new molecular structures as potential drug candidates. However, the training process can be very complex. [135] uses a Glow-based model for bond generation, then employs a unique graph conditional flow for atom creation. The process concludes with the assembly of a molecular graph, complemented by a post hoc validity correction mechanism. A hierarchical flow model is adopted in [136], which allows for control of the changes in the generated graph: modifying the top layer results in global structural changes, and modifying deeper layers results in finer changes. An application to designing 3D molecules that bind to given proteins is [137], where atoms are added sequentially to the generated ligand and a flow model is used to generate the atom type and its location at each step.
3). Variational Autoencoders:
VAEs have been very popular in de novo drug design. These models consist of an Encoder to compress input molecules as vectors in a latent space, where the vectors are sampled from a known distribution (typically Gaussian), and of a Decoder that constructs a molecule from a given sampled vector. During training, mean and variance of the Gaussian distribution are learned in order to minimize the reconstruction error, and at the same time a penalty is applied to force the Decoder posterior distribution to be close to the Gaussian prior. Therefore a VAE learns the parameters of a latent distribution in order to approximate the training dataset, and by sampling from the learned distribution one can generate new molecules. VAEs can also be used to explore and visualize the chemical space of molecules. By learning a low-dimensional representation of the molecules, VAEs can help identify regions in the chemical space where novel and potentially valuable compounds may exist. In particular, recent work on disentanglement of the latent space - so that each latent variable corresponds to a specific property - has led to the introduction of a generalization of VAEs known as -VAEs [161].
Early work includes [138], which focuses on the problem of designing a decoder for graphs by one-shot generation of a probabilistic fully connected graph, and using graph-matching to compute graph similarity in a way that is permutation invariant. SMILES representations are used in [139], where the focus is in understanding the space of continuous latent representations, and generating molecules via simple operations such as perturbing known chemical structures or interpolating between molecules. An effective sequential model is presented in [115], where a VAE is used to associate a junction tree to each molecule, whose nodes represent possible fragments. The generative process follows the tree and adds a fragment for each node. [140] focuses on the problem of generating invalid molecular structures. The authors show empirically that this issue arises when sampling latent space points far away from the data on which the variational autoencoder has been trained, and formulate a constrained Bayesian optimization problem to improve the validity of the generated molecules. In the framework of -VAEs, [141] adds self-attention layers to the model, which allows it to develop its own molecular grammar (meaning “rules that define the relationships between atoms and other structural features within a molecule including branches, double bonds, etc.”). The work shows an unavoidable tradeoff between model exploration and validity, but that the sampling scheme can be used to optimize it. VAEs have been also successfully applied to ligand generation, as in [142], where the target is represented as a point cloud and the ligand by a 3D grid, or in [143], where the target is represented by a 3D grid and the ligand in SMILES form.
4). Generative Adversarial Networks:
A Generative Adversarial Network (GAN) also consists of two networks: a Generator whose goal is to produce likely samples, and a Discriminator whose goal is to distinguish between real, training samples and those created by the Generator. During the training process, the Generator gets better at fooling the Discriminator. These are powerful generative models, which do not require an explicit probability density function, but whose main drawback is the difficulty in training to avoid pitfalls (such as the Generator focusing on samples able to fool the discriminator but of the same type). They have been perhaps the least popular deep learning models for de novo drug design. However, there have been successful applications such as MolGAN [144], which also includes a reward network, and Mol-CycleGAN [145], which adopts the CycleGAN architecture consisting of two GANs trained jointly on two datasets where each GAN is trained with the objective of generating samples similar to the other GAN’s dataset. Recently, GANs have also seen applications in 3D protein design [146].
5). Diffusion Models:
Diffusion models are perhaps the most recent trend in molecular generative models, following the very positive results in image generation. They work by adding noise to data until it follows a chosen distribution, and then reverse the process to generate new samples from said distribution. Most applications represents molecules as graphs or as 3D strutures; in the latter case, it is paramount to account for the rotation and translation invariance of the 3D representation by adopting equivariant neural networks. A survey focused on graph diffusion models is [162]; a recent survey focused on diffusion for 3D representations is [163]. There are at least three types of diffusion models. Denoising probabilistic diffusion models add noise in successive steps of a Markov chain until one has purely noisy distribution, and then reversing the chain to transform a sample from the noise distribution into a new data point [147], [153], [155]. Score based diffusion models are instead trained to optimize a score (or energy) function, which is the gradient of the logarithm of the conditional density distribution of the data given the noise [148], [152]. Finally, diffusion models can be based on stochastic differential equations, a reverse one used to compute the score function, and a forward one used to generate samples [150]. These models have been capable of generating novel and interesting molecules; however, one downside is the general lack of interpretability of the noisy latent distribution. One issue with diffusion models is that the addition of continuous Gaussian noise is not always appropriate for discrete molecular data such as graphs. For this reason, it has been suggested [149] to add noise independently to nodes and edges of graph data. Notably, recent work aims at combining graph representations and 3D/geometric representations [150], [154]. Applications to ligand generation include [151], where the ligand molecule is represented as a graph and decomposed into two parts (arms and scaffold) and each part is generated separately, thus reducing the size of the search space. The first work on target-aware ligand generation using equivariant 3D diffusion is [155], where both the ligand and the pocket are represented as point sets.
6). Reinforcement Learning:
Reinforcement Learning (RL) is not a deep learning generative framework per se, but rather a framework that can be combined with generative architectures. RL has shown promise in certain applications related to de novo drug design, particularly in optimizing molecules to achieve specific objectives. RL is a type of ML where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. RL can be used to optimize the chemical structure of a molecule to achieve desired properties, such as improved binding affinity to a target, enhanced selectivity, chemical validity, or better pharmacokinetic properties. The RL agent interacts with the environment (represented by the molecular space) and learns through trial and error to generate molecules that maximize a specific property. RL can be applied to design novel peptides with specific biological activities or functions, such as antimicrobial peptides, by iteratively optimizing peptide sequences based on desired properties. Despite its promise, RL is not always possible: if the search space of chemical structures is large, RL may require significant computational resources and time to converge on optimal molecules. An effective work implementing RL is [156], which introduced GENTRL, a generative tensorial reinforcement learning framework. A reward function is designed as an ensemble of three Kohonen self-organizing maps (SOM), one trained to predict the activity of compounds against kinases, one trained to select compounds located in neurons associated with DDR1 inhibitors within the whole kinase map, and one trained to assess the novelty of chemical structures. [157] considers molecules as composed by substructures - “rationales” - which are thought to determine each property of interest. The generative process creates a mixture of rationales, and RL is used to fine-tune rationales combinations. [158] trains a transformer generative model and it reduces its complexity through knowledge distillation - a way to transferring knowledge from a model to a smaller one - using both distilled likelihood and the proposed distilled molecules. Finally, RL is applied to the distilled model for fine-tuning to generate more diverse molecules while satisfying multiple property constraints. An application to 3D data is given by [160], where an agent builds molecules by sequentially adding atoms in a way that respects 3D symmetries. If the molecule is rotated or translated, the agent’s action is rotated and translated accordingly. Recently, RL has been also used for protein design with an RNN framework on amino acid sequences, using drug-target interaction as a reward function (Table I) [159].
TABLE I.
Applications of Deep Learning to De Novo Drug Design
| Deep learning framework | Molecule featurization | References |
|---|---|---|
| RNNs and Transformers | SMILES Graphs |
[131]–[133] [134] |
| Flow-based models | Graphs 3D |
[135], [136] [137] |
| Variational Autoencoders |
Graphs SMILES 3D |
[115], [138] [139]–[141] [142], [143] |
| Generative Adversarial Networks | Graphs 3D |
[144], [145] [146] |
| Diffusion models | Graphs 3D |
[147]–[150] [151]–[155] |
| Reinforcement Learning | Graphs SMILES 3D |
[156], [157] [158], [159] [160] |
VI. Assessment Approaches
In addition to the above applications of ML methods in drug development, there are opportunities for artificial intelligence to monitor, assess, and optimize drugs after they are on the market. Pharmacoepidemiology is the study of the benefits, risks, and uses of a drug when it is utilized by a population in an uncontrolled environment [164]. While clinical trials are an important part of the drug approval process, they are highly controlled by design, are logistically limited in the size and diversity of the populations that are included, and occur over a relatively short duration [165]. Thus, they result in a narrow understanding of how the drug may perform in the larger population or how they may interact with other drugs or diseases. Once a drug is on the market, pharmacoepidemiologists attempt to measure the drug’s performance on the population and ensure that it aligns with the results of the clinical trial and continues to be safe and effective [164]. Given the complexity and quantity of data required for this task, ML methods are a natural choice.
Sessa et al. performed a systematic review of AI in pharmacoepidemiology in 2020 including 72 original articles and 5 surveys [26]. They found that random forest, artificial neural networks, and support vector machines were the most utilized methods in this field. In this survey, we will discuss important and recent advancements in three main areas of pharmacoepidemiology where ML techniques can be applied: pharmacovigilance, precision dosing, and population pharmacokinetics (Table II).
TABLE II.
Recent Applications of Artificial Intelligence Methods in Pharmacoepidemiology
| Application | AI Method | References |
|---|---|---|
| Pharmacovigilance | Boosting Algorithms Neural Networks Linear Models Support Vector Machine Tree-Based Algorithms |
[166], [167], [167]–[170] [171]–[176] [168], [177] [178] [169] |
| Precision Dosing | Boosting Algorithms Neural Networks Linear Models Tree-Based Algorithms |
[179]–[181] [182], [183] [179], [181], [183]–[185] [183] |
| Population Pharmacokinetics | Boosting Algorithms Neural Networks Linear Models Support Vector Machine Tree-Based Algorithms |
[186]–[192] [193]–[196] [185], [197], [198] [193] [193], [199]–[201] |
A. Pharmacovigilance
Pharmacovigilance entails monitoring the usage of a drug for adverse drug reactions (ADRs) which are any undesirable effect of a drug beyond its anticipated therapeutic effects [202]. There have been numerous recent review articles on ML methods for pharmacovigilance [27], [203], [204], [205], [206], [207], however, to our knowledge no one has combined these insights with those previously presented in this survey to give a comprehensive understanding of ML utilization at each stage of the drug development life cycle. A scoping review performed by Kompa et al. covering 393 articles from 2001 through 2021 related to ML in pharmacovigilance, identified highly utilized methods to be Bayesian confidence propagation neural network, RNN, support vector machine (SVM), and decision trees [27]. In this section, we will discuss three specific issues in pharmacovigilance that ML is being used to solve: identifying potential ADRs, processing and evaluating reported ADRs, and detecting ADR trends on a population level.
1). Identifying Potential ADRs:
Although ADRs can be formally reported to regulatory agencies, the full extent of ADRs in a population can be difficult to monitor since reported events may be only a fraction of actual events that occured. Given the complex nature of monitoring ADRs in an uncontrolled population, numerous studies have attempted to utilized ML and natural language processing (NLP) to gather data from secondary sources to identify potential ADRs in the population. Two specific sources of data include user-generated internet content and electronic health records (EHRs).
Utilizing user-generated internet content to detect ADRs has become a popular area of pharmacovigilance. Data sources include social media [173], [176], [208], [209], [210], patient forums [178], and Pubmed [171], [211].
An alternate and more natural data source for identifying potential ADRs is EHRs. By harnessing the power of NLP and ML, researchers and healthcare professionals can automate the real-time identification of potential ADRs, thereby enhancing the efficiency and accuracy of pharmacovigilance practices. Kaas-Hansen et al. performed a scoping review of how electronic health records can be utilized for pharmacovigilance [212]. They reviewed seven recent publications that utilized ML and large EHR databases to assess risks associated with certain medications being prescribed to patients at health systems. Primarily, these articles used random forest, logistic regression, and SVM to model this relationship, but some also used neural networks, Naive Bayes, and RL. In addition, Murphy et al. reviewed 29 articles related to supervised learning methods for NLP in ADR detection for inpatients from EHRs, finding LSTM and Conditional Random Field to be the most commonly used [213]. Notably, a recent article from Jeon et al. utilized reinforcement learning on nursing notes to predict ADRs which outperformed other deep learning methods including LSTM [214]. There has also been success in using discharge summaries to detect ADRs [174], [177]. McMaster et al. achieved an impressive ROC–AUC of 0.955 when using a pre-trained BERT-based model to identify discharge summaries containing ADR mentions [174]. Meanwhile Tan et al. implemented a two-step approach that involved first extracting ADR and drug information from discharge summaries and then applying ML to quantify the relationship between the two [177]. In their pipeline, they applied Naïve Bayes, k-Nearest Neighbour, Stochastic Gradient Descent, Decision Trees, Random Forest, eXtreme Gradient Boosting, and LSTM and then combined the predictions of these algorithms into a Logistic Regression model to detect ADRs. Their hybrid approach achieved the best results of 0.73 and 0.61 F-1 Score on temporally and geographically separated data. Recent work has also sought to expand the implications of this work into language-agnostic spaces by utilizing neural networks [175] and specific drugs with XGBoost [166].
2). Processing Reported ADRs:
FDA hosts a database of post-market ADRs for drugs and biologic products known as the Adverse Event Reporting System (FAERS). These ADRs are reported as Individual Case Safety Reports (ICSRs) which require a significant amount of manual effort to process, evaluate, and assess for drug-event causality. NLP and ML methods can be used to streamline each stage of this process and improve the quality both of the data in the individual ICSRs and the resulting analyses [215], [216].
Globally, researchers have applied NLP methods and Light Gradient Boosted Machine (LGBM) to identify ADRs, assess severity [167]. and to prioritize ICSRs for clinical review with XGBoost and Linear Regression [168]. Subsequently an assessment can be made on the causal relationship between the drug and event reported. Researchers at John Hopkins University and the FDA tested multiple models, including decision trees, kNN, logistic regression and SVM, to assess the probability of a causal relationship in FAERS reports [217] Their model to identify reports with enough information to make an informed causality assessment performed well with recall above 0.80 while their best model that identified reports with at least possible causality had recall above 0.75. All the models tested performed comparably on a second dataset, suggesting possible generalizability across different ICSR reports. Similarly, a research team at Janssen used a Bayesian network model to predict the likelihood of causality on an internal dataset of ICSRs resulting in recall of 0.90 [218]. All of these results suggest that ML can be used to improve ICSR processing resulting in timely and more accurate safety decisions.
3). Detecting ADR Trends:
Another area of pharmacovigilance where ML has great potential is signal detection and validation at a population level after ADRs have been reported and the ICSRs have been processed. Signal validation is the process of evaluating evidence of a drug-ADR relationship and determining if further analysis is necessary. Imran et al. sought to sort these drug-ADR pairs from 6 medicinal products into 6 signal validation categories with XGBoost, achieving an accuracy above 83% [169]. They also prospectively tested their model and gathered user feedback as to the impact on the overall workflow. Lee et al. and Alroobaea et al. took a slightly different approach, training Gradient Boosting Machines, Random Forest, and Bayesian Confidence Propagation Neural Networks to detect specific ADRs for specific drugs in multiple ADR reporting systems [170], [172]. When Gradient Boosting Machines were applied to the FAERS database to detect ADRs related to infliximab, an immunosuppressant, the resulting AUROC was 73% [170]. Although to date there has been less focus on this area, these studies highlight a natural overlap between this pharmacovigilance problem and the benefits of ML techniques.
B. Precision Dosing
Model-Informed Precision Dosing (MIPD) is a quantitative approach to individualized dosage optimization combining patient characteristics with information on drugs and diseases and is traditionally done using Bayesian forecasting [219]. Numerous surveys [220], [221], [222], [223], [224], [224] and recent studies have detailed applications of Bayesian-forecasting based MIPD to renal diseases [225], [225], [226], tuberculosis [227], [228], and antibiotics [229], [230], [231], [232], [233]. In addition, Dong et al. published a pilot study utilizing Bayesian forecasting MIPD in a pediatric population for MIPD of alemtuzumab [234], and Ewoldt et al. even performed a randomized control trial of MIPD of beta-lactam antibiotics and ciprofloxacin in critically ill patients across 8 hospitals in the Netherlands [235]. Given the complex and vast amount of data that is needed to account for patient variations in MIPD, utilization of AI-based methods has been suggested as an alternative to Bayesian forecasting [236], [237], [238]. Multiple surveys have been performed to summarize the ML approaches to MIPD in recent years [28], [239], [240] including RL methods [241].
Recent studies have continued the approaches and solutions identified in these surveys to implement ML-based MIPD. The antibiotic, vancomycin, has been the drug subject of multiple studies that used XGBoost and regression analyses for dose optimization [179], [180], [181]. Specifically, Huang et al’s XGBoost algorithm was able to account for for 67.5% of variations in a retrospective validation cohort where patients were administered vancomycin using the hospital-approved dosage strategy [180]. Studies focused on precision dosing for specific drugs in pediatric epilepsy [184], inflammatory bowel diseases [242], and liver and kidney transplant patients [185] have utilized linear regression, nonlinear mixed effect models, and multivariate adaptive regression spline methods respectively. Barrio et al. applied regression, random forest, and neural network methods to achieve the target thyrotropin in thyroidectomy patients with levothyroxine, out-performing the standard of care [183]. Finally, Chi et al. utilized a feed-forward neural network with sigmoid hidden layers and softmax output to preform dose optimization of statins, which are highly effective in lowering cholesterol, in elderly adults [182].
C. Population Pharmacokinetics
Pharmacokinetics is the study of drug behavior in the body, including absorption, distribution, metabolism, and excretion which is applicable in numerous stages of drug development. In the post-market stage, population pharmacokinetics (PPK), looks at population level parameters such as drug exposure and efficacy over time essentially performing reactivity phenotying. The results of such studies assist in MIPD, longitudinally measure patient outcomes, and identify potential covariates in a large, real-world population [239]. PPK modeling traditionally uses mechanistic, mathematical, and statistical models such as the non-linear mixed effects models or Maximum a posteriori Bayesian estimation (MAP-BE). Recently, it has been suggested that ML could enhance PPK methodologies and expand them for use in larger populations [228], [238], [243]. Multiple surveys have explored this specifically through predicting drug-blood concentrations and screening or identifying potential covariates [11], [29], [244], [245], [246].
Drug-blood concentrations are an important factor in determining the clinical response to a drug [247]. Since there are diverse pharmacokinetic profiles between individuals, measuring drug concentrations is important to inform dosing regimes and optimize patient outcomes. There have been numerous studies applying ML models to estimate drug exposure and comparing them to traditional PPK analysis approaches, usually using features chosen through step-wise covariate modeling normally done in PPK. XGBoost is by far the most popular model chosen to predict the drug blood concentrations for drugs and has been applied to tacrolimus [186], [190], Valproic acid [189], and Vancomycin [192]. Alternatively, RNN and its variant LSTM have been applied to Vancomycin [196] and Olanzapine [194] respectively. Other proposed methods include quantile regression Gradient Boosting Trees and multivariate adaptive regression splines [185], [200]. Other studies have conducted comparative analyses of concentration estimation using various ML methods. Keutzer et al.’s study demonstrated that XGBoost outperforms both linear regressions and Gradient Boosting Machines in predicting plasma concentrations [188]. Additionally, Huang et al. performed a PPK analysis with a nonlinear mixed-effects model and then applied 6 different ML models including XGBoost, Random Forest, Extra-Trees, Gradient Boosting Decision Tree, Adaptive boosting, and Lasso, to model tacrolimus clearance with the Lasso model performing the best [198]. Hybrid models, where XGBoost and generalized linear models can decrease the mean predictive error of MAP-BE between reference clearance values and trough concentrations in simulated data, have also been proposed to harness the benefits of both ML and traditional PPK methodologies [187], [197].
In addition to performing direct predictions, ML can be used to supplement PPK or screen for covariates. Some prior work suggests utilizing ML methods such as clustering and tree-based methods to explore the factors of influence that could better inform PPK analyses [199], [201], [248]. Interestingly Nair et al. utilized a generative model to perform data exploration of physiological determinants that could impact drug dosing [249]. It has also been suggested that ML methods such as random forest, neural network, and support vector regression are a much faster method for screening covariates in the first place over the step-wise covariate modeling normally done in PPK [193]. In their review, Gill et al. performed a comparison of ML and traditional PPK methods, noting how drug-drug interactions could be identified as a potential covariate through ML PPK analyses [245].
VII. Discussion
AI-based techniques are widely integrated into diverse facets of drug discovery and development, rapidly revolutionizing and enhancing various stages of this process. The FDA Modernization Act 2.0, which was recently passed, emphasizes the significance of these methods [250]. This act allows for evaluating the effectiveness and safety of new drugs using models including computational methods as a substitute for animal studies before human clinical trials.
Among various challenges during the DDD process, target identification and validation remains one of the most critical yet inadequately addressed. As of 2022, the number of successfully identified drug targets stood at fewer than 500 [251], representing a tiny portion of the potential drug targets within the human genome [252], [253]. Furthermore, despite considerable effort and progress in developing computational models within the DDD domain over the past decade, the clinical trials’ average rate of failure reached about 84% from 2009 to 2018 [254], with the absence of clinical effectiveness emerging as the primary factor leading to the failure of Phase 2 and 3 clinical trials [255]. These highlight the utmost importance of developing novel mathematical algorithms and computational methods that can more efficiently analyze the massive high throughput Omics data, and phenotypic characteristics of disease and identify the right targets thereby increasing the likelihood of developing clinically effective therapies.
Using network-based methods that assess the interaction of the biological molecule within the interactome, target selection approaches have changed from focusing on a single target to modeling an entire signal transduction pathway and using that understanding to select drug targets better. However, currently, only about 10% of the human genome is associated with known diseases [256]. Furthermore, the existing human PPI repository is believed to encompass around 25% of all potential interactions [257], leaving the remaining interactions undetected or unexplored at present. This may pose a potential obstacle for detecting small disease modules, as they are more likely to be fragmented within the existing interactome [258]. Another obstacle would be post-translational modifications (PTMs) that occur in nearly all proteins and exert a highly dynamic and ubiquitous influence on PPIs [259]. These modifications can selectively reinforce, weaken, or eliminate specific protein node(s) interaction in a biological network and consequently alter pre-existing PPIs or generate novel ones in a temporally and spatially specific manner. The existing human interactome predominantly encompasses one type of PTM, specifically phosphorylation, out of the vast array of over 200 known PTM forms [260].
In the course of drug development, even when target proteins are identified successfully, designing a high-affinity molecule capable of efficiently and specifically perturbing the network proves to be a challenging endeavor. This challenge necessitates the availability of computational tools that enable rapid exploration of the vast chemical and biological compounds space. Moreover, accurately estimating the affinity between a drug and its target demands simulating the electron states within the binding pocket when the ligand is present, thus calling for a quantum-based solution. As suggested in [34], the integration of powerful combinatorial frameworks that merge network search algorithms with quantum computing can create novel opportunities in DDD. This approach enables researchers to effectively screen, analyze, and optimize treatment compounds, identifying those with the most promising biological properties for modulating disease networks, while minimizing unacceptable side effects and toxicity.
Recently, AI-based de novo drug design has attracted a huge amount of attention and a plethora of models have been proposed; however, several challenges remain, encompassing both the algorithmic level and the implementation of these methods. The first one is how to evaluate these models since a significant gap remains between AI-generated molecules and their in vitro synthesis. The problem of generating valid and synthesizable molecules is not fully solved. Therefore it is often not easy to evaluate the quality of the created molecules, and one has to rely on imperfect scores that are surrogate measures for properties such as drug-likeness, toxicity, novelty, and most importantly synthesizability [261]. On the methodological level, most models have limited interpretability, which results in a lack of understanding of the generation process and low confidence in the generated molecules. This is why recently a lot more effort has gone towards interpretability, which would also lead to clearer expectations when it comes to in vitro synthesis. Another limitation of most models is that they focus on one type of molecular representation and therefore only use part of the available data and a subset of the known molecular properties [123]. The trend towards multimodality has been trying to address this issue [120], [121]. Additionally, despite the availability of large amounts of data, the chemical space is so large and complex that guiding the generative process to certain specific targets or objectives is still challenging. Frameworks like RL and Active Learning have proven to be very useful in making the search more efficient and in leading to drugs that can be synthesized and are effective [156].
De novo drug design generally requires substantial amounts of training data in the form of known drug-target interactions and relies on the analogy between chemical similarity and interaction similarity. However, the complexity of chemical space and biological interactions means that this analogy can break down. One weakness of current de novo methods is that they do not consider the interactions in an end-to-end way, i.e., there is no consideration of direct drug-target affinity in the generation process. This is understandable, given the complexity of the DTI problem on its own, but this can lead to the generation of ineffective drugs that fail in later investigations.
Another significant challenge in utilizing AI during the DDD process is acquiring and curating data for post-market drug assessment. Currently, various data sources, such as user-generated internet content, EHRs, ICSRs, and small-scale case studies, are employed for these evaluations, but each source possesses inherent limitations. While user-generated content and ICSRs pull from large populations, it can be difficult to ensure data integrity and comprehensively. Alternatively, EHRs and case studies are comprehensive but pull from relatively small populations resulting in models that can not sufficiently generalize to the entire population using a specific drug. To address this, models that combine EHR data from multiple demographically or geographically diverse institutions would increase the model’s generalizability and better reflect how the drug is performing in the population as a whole. To this end, new approaches for processing EHR data have been designed to support ML-based drug assessment post-market making it easier for institutions to participate in these kinds of studies. In particular, Choi et al. developed a standardized system to prepare EHR datasets for analysis including data extraction from relational databases, natural language processing of clinical notes, and standard preprocessing of data for ML tasks [262].
VIII. Conclusion
In conclusion, it is crucial to acknowledge the complementarity of generative and predictive computational models during the DDD process. A comprehensive drug design workflow entails: first, identifying the appropriate target(s); second, utilizing generative models for the de novo design of new molecules or DTI prediction models to select between previously known chemical compounds; and third, employing predictive models to filter compounds based on their desired molecular properties. Furthermore, AI-based computational models play important roles in predicting drugs’ pharmacokinetic properties, clinical trial design, and pharmacovigilance assessment.
The current state of these models is that they are not fully mature but are somewhat effective in their individual domains. For example, AI methods can identify and propose reasonable targets and suggest drugs which might be active against these targets. However, we are still some way away from an automated process that can go from disease to a drug with high probability and without extensive human intervention and ingenuity.
AI and, in particular, ML models for these problems are under current and rapid development. Three factors will be key to this development in the future. First, the development of computational resources dedicated to these problems will be required. For example, the recent developments in large language models have been fuelled by models that are staggering in their scale. Extremely large models will likely be required for drug development problems. Second, large amounts of data will be needed to train these models, and these kinds of datasets are not currently available. Recent advances in computer vision and language processing relied on the ready availability of images and text on the internet. Finally, it is likely that a more end-to-end approach will be needed, which incorporates these separate processes of target identification, drug design, and drug-target interaction in a single model.
Acknowledgments
This work was supported by the National Institute of Health (NIH) under Grant P30ES017885-11-S1 and Grant U24CA271037.
Biographies
Flora Rajaei received the MSc degree in data science, and the PhD degree in molecular and developmental biology. She is currently a postdoctoral research fellow with the Department of Computational Medicine and Bioinformatics, University of Michigan. Her main research interests include AI applications in early drug discovery and development, as well as the development of clinical decision support systems.
Cristian Minoccheri received the PhD degree in mathematics. He is currently a research investigator with the Department of Computational Medicine and Bioinformatics, University of Michigan. His research interests are in tensor methods for machine learning and deep learning, interpretable machine learning, and generative models.
Emily Wittrup received the MSc degree in data science and is currently working toward the PhD degree in bioinformatics with the University of Michigan. She is currently a senior computational biologist with the Department of Computational Medicine and Bioinformatics, University of Michigan. Her research interests include developing clinical decision support systems, computational biology, and machine learning.
Richard C. Wilson received the PhD degree. He is a professor of pattern analysis with the Department of Computer Science, University of York. His research interests are machine learning, computer vision, and pattern recognition with graphs and networks.
Brian D. Athey received the PhD degree. He is the Michael A. Savageau Collegiate professor and founding chair with the Department of Computational Medicine and Bioinformatics (DCMB), University of Michigan Medical School. In addition, he served as co-founder and co-director of the campus-wide Michigan Institute for Data Science (MIDAS).
Gilbert S. Omenn received the MD and PhD degrees. He is the Harold T. Shapiro Distinguished University Professor of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and Environmental Health with the University of Michigan. He is currently the director of the university-wide Center for Computational Medicine & Bioinformatics. He serves on the boards of the Weizmann Institute of Science, Foundation for the NIH, and Hastings Center for Bioethics.
Kayvan Najarian received the PhD degree. He is a professor with the Departments of Computational Medicine and Bioinformatics (DCMB), Emergency Medicine, and Electrical Engineering and Computer Science, University of Michigan. He is the director of the Center for Data-Driven Drug Development and Treatment Assessment (DATA) which is an NSF IUCRC with the University of Michigan. He is the director of the Biomedical and Clinical Informatics Laboratory, an associate director for the Weil Institute for Critical Care Research and Innovation, and an associate director for the Michigan Institute for Data Science (MIDAS), serving as the point person for data science collaboration in Biological Sciences and Health Sciences.
Contributor Information
Flora Rajaei, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA.
Cristian Minoccheri, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA.
Emily Wittrup, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA.
Richard C. Wilson, Computer Science Department, University of York, YO10 5GH York, U.K.
Brian D. Athey, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
Gilbert S. Omenn, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
Kayvan Najarian, Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA.
References
- [1].DiMasi JA, Grabowski HG, and Hansen RW, “Innovation in the pharmaceutical industry: New estimates of R&D costs,” J. Health Econ, vol. 47, pp. 20–33, 2016. [DOI] [PubMed] [Google Scholar]
- [2].Waring MJ et al. , “An analysis of the attrition of drug candidates from four major pharmaceutical companies,” Nature Rev. Drug Discov, vol. 14, no. 7, pp. 475–486, 2015. [DOI] [PubMed] [Google Scholar]
- [3].Moingeon P, Kuenemann M, and Guedj M, “Artificial intelligence-enhanced drug design and development: Toward a computational precision medicine,” Drug Discov. Today, vol. 27, no. 1, pp. 215–222, 2022. [DOI] [PubMed] [Google Scholar]
- [4].Pun FW, Ozerov IV, and Zhavoronkov A, “AI-powered therapeutic target discovery,” Trends Pharmacological Sci., vol. 44, pp. 561–572, 2023. [DOI] [PubMed] [Google Scholar]
- [5].Pun FW et al. , “Hallmarks of aging-based dual-purpose disease and age-associated targets predicted using pandaomics AI-powered discovery engine,” Aging (Albany NY), vol. 14, no. 6, pp. 2475–2506, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Zhang T, Zhang S-W, Xie M-Y, and Li Y, “A novel heterophilic graph diffusion convolutional network for identifying cancer driver genes,” Brief. Bioinf, vol. 24, no. 3, 2023, Art. no. bbad137. [DOI] [PubMed] [Google Scholar]
- [7].Dong T, Yang Z, Zhou J, and Chen CY-C, “Equivariant flexible modeling of the protein–ligand binding pose with geometric deep learning,” J. Chem. Theory Comput, vol. 19, no. 22, pp. 8446–8459, 2023. [DOI] [PubMed] [Google Scholar]
- [8].Zhang X. et al. , “Efficient and accurate large library ligand docking with karmadock,” Nature Comput. Sci, vol. 3, no. 9, pp. 789–804, 2023. [DOI] [PubMed] [Google Scholar]
- [9].Bagherian M, Kim RB, Jiang C, Sartor MA, Derksen H, and Najarian K, “Coupled matrix–matrix and coupled tensor–matrix completion methods for predicting drug–target interactions,” Brief. Bioinf, vol. 22, no. 2, pp. 2161–2171, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Ivanenkov YA et al. , “Chemistry42: An AI-driven platform for molecular design and optimization,” J. Chem. Inf. Model, vol. 63, no. 3, pp. 695–701, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Ota R. and Yamashita F, “Application of machine learning techniques to the analysis and prediction of drug pharmacokinetics,” J. Controlled Release, vol. 352, pp. 961–969, 2022. [DOI] [PubMed] [Google Scholar]
- [12].Liu G. et al. , “Deep learning-guided discovery of an antibiotic targeting acinetobacter baumannii,” Nature Chem. Biol, vol. 19, pp. 1342–1350, 2023. [DOI] [PubMed] [Google Scholar]
- [13].G. Healthcare, “First drug created by AI enters clinical trials,” 2023. [Online]. Available: https://www.clinicaltrialsarena.com/comment/first-drug-created-ai-enters-trials/#::text=Hong%20Kong%2Dbased%20biotech%20InSilico,causing%20scarring%20within%20the%20lungs [Google Scholar]
- [14].Wills T. et al. , “AI drug discovery: Assessing the first AI-designed drug candidates to go into human clinical trials,” 2022. [Online]. Available: https://www.cas.org/resources/cas-insights/drug-discovery/ai-designed-drug-candidates [Google Scholar]
- [15].Hasan MR et al. , “Application of mathematical modeling and computational tools in the modern drug design and development process,” Molecules, vol. 27, no. 13, pp. 4169, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Bassani D. and Moro S, “Past, present, and future perspectives on computer-aided drug design methodologies,” Molecules, vol. 28, no. 9, pp. 3906, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Sabe VT et al. , “Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: A review,” Eur. J. Med. Chem, vol. 224, 2021, Art. no. 113705. [DOI] [PubMed] [Google Scholar]
- [18].Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, and Najarian K, “Machine learning approaches and databases for prediction of drug–target interaction: A survey paper,” Brief. Bioinf, vol. 22, no. 1, pp. 247–269, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [19].Gns HS, Saraswathy G, Murahari M, and Krishnamurthy M, “An update on drug repurposing: Re-written saga of the drug’s fate,” Biomed. Pharmacother, vol. 110, pp. 700–716, 2019. [DOI] [PubMed] [Google Scholar]
- [20].Nishimura Y. and Hara H, “Drug repositioning: Current advances and future perspectives,” Front. Pharmacol, vol. 9, pp. 1068, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Sadeghi SS and Keyvanpour MR, “An analytical review of computational drug repurposing,” IEEE/ACM Trans. Comput. Biol. Bioinf, vol. 18, no. 2, pp. 472–488, Mar./Apr. 2021. [DOI] [PubMed] [Google Scholar]
- [22].Vamathevan J. et al. , “Applications of machine learning in drug discovery and development,” Nature Rev. Drug Discov, vol. 18, no. 6, pp. 463–477, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Zhang S, Bamakan SMH, Qu Q, and Li S, “Learning for personalized medicine: A comprehensive review from a deep learning perspective,” IEEE Rev. Biomed. Eng, vol. 12, pp. 194–208, 2018. [DOI] [PubMed] [Google Scholar]
- [24].Singh M. et al. , “Current understanding of biological interactions and processing of dna origami nanostructures: Role of machine learning and implications in drug delivery,” Biotechnol. Adv, vol. 61, 2022, Art. no. 108052. [DOI] [PubMed] [Google Scholar]
- [25].Cheng F. et al. , “Network-based approach to prediction and population-based validation of in silico drug repurposing,” Nature Commun., vol. 9, no. 1, 2018, Art. no. 2691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Sessa M, Khan AR, Liang D, Andersen M, and Kulahci M, “Artificial intelligence in pharmacoepidemiology: A systematic review. Part 1—Overview of knowledge discovery techniques in artificial intelligence,” Front. Pharmacol, vol. 11, pp. 1028, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Kompa B. et al. , “Artificial intelligence based on machine learning in pharmacovigilance: A scoping review,” Drug Saf., vol. 45, no. 5, pp. 477–491, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Terranova N, Venkatakrishnan K, and Benincosa LJ, “Application of machine learning in translational medicine: Current status and future opportunities,” AAPS J., vol. 23, no. 4, 2021, Art. no. 74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].McComb M, Bies R, and Ramanathan M, “Machine learning in pharmacometrics: Opportunities and challenges,” Brit. J. Clin. Pharmacol, vol. 88, no. 4, pp. 1482–1499, 2022. [DOI] [PubMed] [Google Scholar]
- [30].Qureshi R. et al. , “AI in drug discovery and its clinical relevance,” Heliyon, vol. 9, 2023, Art. no. e17575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Sessa M, Liang D, Khan AR, Kulahci M, and Andersen M, “Artificial intelligence in pharmacoepidemiology: A systematic review. Part 2–Comparison of the performance of artificial intelligence and traditional pharmacoepidemiological techniques,” Front. Pharmacol, vol. 11, 2021, Art. no. 568659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Barabási A-L, Gulbahce N, and Loscalzo J, “Network medicine: A network-based approach to human disease,” Nature Rev. Genet, vol. 12, no. 1, pp. 56–68, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Lee LY-H and Loscalzo J, “Network medicine in pathobiology,” Amer. J. Pathol, vol. 189, no. 7, pp. 1311–1326, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Maniscalco S. et al. , “Quantum network medicine: Rethinking medicine with network science and quantum algorithms,” 2022, arXiv:2206.12405. [Google Scholar]
- [35].Ravasz E, Somera AL, Mongru DA, Oltvai ZN, and Barabási A-L, “Hierarchical organization of modularity in metabolic networks,” Science, vol. 297, no. 5586, pp. 1551–1555, 2002. [DOI] [PubMed] [Google Scholar]
- [36].Bader GD and Hogue CW, “An automated method for finding molecular complexes in large protein interaction networks,” BMC Bioinf., vol. 4, no. 1, pp. 1–27, 2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Cumbo F, Paci P, Santoni D, Di Paola L, and Giuliani A, “GIANT: A cytoscape plugin for modular networks,” PLoS One, vol. 9, no. 10, 2014, Art. no. e105001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [38].Rivera CG, Vakil R, and Bader JS, “NeMo: Network module identification in cytoscape,” BMC Bioinf., vol. 11, pp. 1–9, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [39].Tomasoni M. et al. , “MONET: A toolbox integrating top-performing methods for network modularization,” Bioinformatics, vol. 36, no. 12, pp. 3920–3921, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [40].Cowen L, Ideker T, Raphael BJ, and Sharan R, “Network propagation: A universal amplifier of genetic associations,” Nature Rev. Genet, vol. 18, no. 9, pp. 551–562, 2017. [DOI] [PubMed] [Google Scholar]
- [41].Lü L, Chen D, Ren X-L, Zhang Q-M, Zhang Y-C, and Zhou T, “Vital nodes identification in complex networks,” Phys. Rep, vol. 650, pp. 1–63, 2016. [Google Scholar]
- [42].Indulekha T, Aswathy G, and Sudhakaran P, “A graph based algorithm for clustering and ranking proteins for identifying disease causing genes,” in Proc. 2018 Int. Conf. Adv. Comput. Commun. Inform, 2018, pp. 1022–1026. [Google Scholar]
- [43].Tripathi B, Parthasarathy S, Sinha H, Raman K, and Ravindran B, “Adapting community detection algorithms for disease module identification in heterogeneous biological networks,” Front. Genet, vol. 10, pp. 164, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Kitsak M. et al. , “Identification of influential spreaders in complex networks,” Nature Phys., vol. 6, no. 11, pp. 888–893, 2010. [Google Scholar]
- [45].Ahajjam S. and Badir H, “Identification of influential spreaders in complex networks using hybridrank algorithm,” Sci. Rep, vol. 8, no. 1, 2018, Art. no. 11932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Li H. et al. , “Deciphering the mechanism of indirubin and its derivatives in the inhibition of imatinib resistance using a “drug target prediction-gene microarray analysis-protein network constructio,” strategy,” BMC Complement. Altern. Med, vol. 19, no. 1, pp. 1–13, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Yu H, Kim PM, Sprecher E, Trifonov V, and Gerstein M, “The importance of bottlenecks in protein networks: Correlation with gene essentiality and expression dynamics,” PLoS Comput. Biol, vol. 3, no. 4, 2007, Art. no. e59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [48].Hwang W-C, Zhang A, and Ramanathan M, “Identification of information flow-modulating drug targets: A novel bridging paradigm for drug discovery,” Clin. Pharmacol. Therapeutics, vol. 84, no. 5, pp. 563–572, 2008. [DOI] [PubMed] [Google Scholar]
- [49].Jalili M. et al. , “Evolution of centrality measurements for the detection of essential proteins in biological networks,” Front. Physiol, vol. 7, pp. 375, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [50].Mallik S. and Maulik U, “MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset,” J. Biomed. Inform, vol. 57, pp. 308–319, 2015. [DOI] [PubMed] [Google Scholar]
- [51].Giscard P-L and Wilson RC, “Cycle-centrality in economic and biological networks,” in Proc. Int. Conf. Complex Netw. Appl, Cherifi C, Cherifi H, Karsai M, and Musolesi M, Eds., Cham: Springer International Publishing, 2018, pp. 14–28. [Google Scholar]
- [52].Chen L, Huang T, Zhang Y-H, Jiang Y, Zheng M, and Cai Y-D, “Identification of novel candidate drivers connecting different dysfunctional levels for lung adenocarcinoma using protein-protein interactions and a shortest path approach,” Sci. Rep, vol. 6, no. 1, 2016, Art. no. 29849. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [53].Kaushal P. and Singh S, “Network-based disease gene prioritization based on protein–protein interaction networks,” Netw. Model. Anal. Health Inform. Bioinf, vol. 9, pp. 1–16, 2020. [Google Scholar]
- [54].Campbell IM, James RA, Chen ES, and Shaw CA, “NetComm: A network analysis tool based on communicability,” Bioinformatics, vol. 30, no. 23, pp. 3387–3389, Aug. 2014, doi: 10.1093/bioinformatics/btu536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [55].Thomas JP, Modos D, Korcsmaros T, and Brooks-Warburton J, “Network biology approaches to achieve precision medicine in inflammatory bowel disease,” Front. Genet, vol. 12, 2021, Art. no. 760501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [56].Lin W. et al. , “Metabolomics and correlation network analyses of core biomarkers in type 2 diabetes,” Amino Acids, vol. 52, pp. 1307–1317, 2020. [DOI] [PubMed] [Google Scholar]
- [57].Walker JT et al. , “Genetic risk converges on regulatory networks mediating early type 2 diabetes,” Nature, vol. 624, pp. 621–629, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [58].Chen Y, Zhang X-F, and Ou-Yang L, “Inferring cancer common and specific gene networks via multi-layer joint graphical model,” Comput. Struct. Biotechnol. J, vol. 21, pp. 974–990, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [59].Zhou X. et al. , “A systems approach to refine disease taxonomy by integrating phenotypic and molecular networks,” EBioMedicine, vol. 31, pp. 79–91, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Zheng K, You Z-H, Wang L, Wong L, and Chen Z-H, “Inferring disease-associated piwi-interacting RNAs via graph attention networks,” in Proc. 16th Int. Conf. Intell. Comput. Theories Appl, Bari, Italy, Springer, 2020, pp. 239–250. [Google Scholar]
- [61].Zhao W, Gu X, Chen S, Wu J, and Zhou Z, “MODIG: Integrating multi-omics and multi-dimensional gene network for cancer driver gene identification based on graph attention network model,” Bioinformatics, vol. 38, no. 21, pp. 4901–4907, 2022. [DOI] [PubMed] [Google Scholar]
- [62].Song Q. et al. , “DeepAlloDriver: A deep learning-based strategy to predict cancer driver mutations,” Nucleic Acids Res., vol. 51, no. W1, pp. W129–W133, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [63].Goyal P. and Ferrara E, “Graph embedding techniques, applications, and performance: A survey,” Knowl.-Based Syst, vol. 151, pp. 78–94, 2018. [Google Scholar]
- [64].Kamya P. et al. , “PandaOmics: An AI-driven platform for therapeutic target and biomarker discovery,” J. Chem. Inf. Model, vol. 64, no. 10, pp. 3961–3969, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [65].Xu X, Huang M, and Zou X, “Docking-based inverse virtual screening: Methods, applications, and challenges,” Biophys. Rep, vol. 4, pp. 1–16, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [66].Galati S, Di Stefano M, Martinelli E, Poli G, and Tuccinardi T, “Recent advances in in silico target fishing,” Molecules, vol. 26, no. 17, pp. 5124, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [67].Dakshanamurthy S. et al. , “Predicting new indications for approved drugs using a proteochemometric method,” J. Med. Chem, vol. 55, no. 15, pp. 6832–6848, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [68].Ramsay RR, Popovic-Nikolic MR, Nikolic K, Uliassi E, and Bolognesi ML, “A perspective on multi-target drug discovery and design for complex diseases,” Clin. Transl. Med, vol. 7, no. 1, pp. 1–14, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [69].Pinzi L. and Rastelli G, “Molecular docking: Shifting paradigms in drug discovery,” Int. J. Mol. Sci, vol. 20, no. 18, pp. 4331, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Kitchen DB, Decornez H, Furr JR, and Bajorath J, “Docking and scoring in virtual screening for drug discovery: Methods and applications,” Nature Rev. Drug Discov, vol. 3, no. 11, pp. 935–949, 2004. [DOI] [PubMed] [Google Scholar]
- [71].Stanzione F, Giangreco I, and Cole JC, “Use of molecular docking computational tools in drug discovery,” Prog. Med. Chem, vol. 60, pp. 273–343, 2021. [DOI] [PubMed] [Google Scholar]
- [72].Jumper J. et al. , “Highly accurate protein structure prediction with alphafold,” Nature, vol. 596, no. 7873, pp. 583–589, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Salmaso V. and Moro S, “Bridging molecular docking to molecular dynamics in exploring ligand-protein recognition process: An overview,” Front. Pharmacol, vol. 9, pp. 923, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [74].Genheden S. and Ryde U, “The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities,” Expert Opin. drug Discov, vol. 10, no. 5, pp. 449–461, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Rastelli G. and Pinzi L, “Refinement and rescoring of virtual screening results,” Front. Chem, vol. 7, pp. 498, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [76].Ain QU, Aleksandrova A, Roessler FD, and Ballester PJ, “Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening,” Wiley Interdiscipl. Rev.: Comput. Mol. Sci, vol. 5, no. 6, pp. 405–424, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [77].Wójcikowski M, Ballester PJ, and Siedlecki P, “Performance of machine-learning scoring functions in structure-based virtual screening,” Sci. Rep, vol. 7, no. 1, 2017, Art. no. 46710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [78].Pereira JC, Caffarena ER, and Dos Santos CN, “Boosting docking-based virtual screening with deep learning,” J. Chem. Inf. Model, vol. 56, no. 12, pp. 2495–2506, 2016. [DOI] [PubMed] [Google Scholar]
- [79].Zhang X. et al. , “TB-IECS: An accurate machine learning-based scoring function for virtual screening,” J. Cheminformatics, vol. 15, no. 1, 2023, Art. no. 63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [80].Hadfield TE, Scantlebury J, and Deane CM, “Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding,” J. Cheminformatics, vol. 15, 2023, Art. no. 84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [81].Abramson J. et al. , “Accurate structure prediction of biomolecular interactions with AlphaFold 3,” Nature, vol. 630, pp. 493–500, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [82].Lu W. et al. , “DynamicBind: Predicting ligand-specific protein-ligand complex structure with a deep equivariant generative model,” Nature Commun., vol. 15, no. 1, 2024, Art. no. 1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [83].Shi J-Y and Yiu S-M, “SRP: A concise non-parametric similarity-rank-based model for predicting drug-target interactions,” in Proc. 2015 IEEE Int. Conf. Bioinf. Biomed, 2015, pp. 1636–1641. [Google Scholar]
- [84].Buza K, “Drug-target interaction prediction with hubness-aware machine learning,” in Proc. IEEE 11th Int. Symp. Appl. Comput. Intell. Informat, 2016, pp. 437–440. [Google Scholar]
- [85].Pahikkala T. et al. , “Toward more realistic drug–target interaction predictions,” Brief. Bioinf, vol. 16, no. 2, pp. 325–337, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [86].Lavecchia A, “Machine-learning approaches in drug discovery: Methods and applications,” Drug Discov. Today, vol. 20, no. 3, pp. 318–331, 2015. [DOI] [PubMed] [Google Scholar]
- [87].Van Laarhoven T. and Marchiori E, “Predicting drug-target interactions for new drug compounds using a weighted nearest neighbor profile,” PLoS One, vol. 8, no. 6, 2013, Art. no. e66952. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [88].Niu YQ, “Supervised prediction of drug–target interactions by ensemble learning,” J. Chem. Pharm. Res, vol. 6, no. 7, pp. 1991–1999, 2014. [Google Scholar]
- [89].Wang L. et al. , “A computational-based method for predicting drug–target interactions by using stacked autoencoder deep neural network,” J. Comput. Biol, vol. 25, no. 3, pp. 361–373, 2018. [DOI] [PubMed] [Google Scholar]
- [90].Shan W, Li X, Yao H, and Lin K, “Convolutional neural network-based virtual screening,” Curr. Med. Chem, vol. 28, no. 10, pp. 2033–2047, 2021. [DOI] [PubMed] [Google Scholar]
- [91].Berrar D. and Dubitzky W, “Deep learning in bioinformatics and biomedicine,” Brief. Bioinf, vol. 22, pp. 1513–1514, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [92].Shao K, Zhang Y, Wen Y, Zhang Z, He S, and Bo X, “DTI-HETA: Prediction of drug–target interactions based on GCN and GAT on heterogeneous graph,” Brief. Bioinf, vol. 23, no. 3, 2022, Art. no. bbac109. [DOI] [PubMed] [Google Scholar]
- [93].Wang W. et al. , “GCHN-DTI: Predicting drug-target interactions by graph convolution on heterogeneous networks,” Methods, vol. 206, pp. 101–107, 2022. [DOI] [PubMed] [Google Scholar]
- [94].Luo Y, Liu Y, and Peng J, “Calibrated geometric deep learning improves kinase–drug binding predictions,” Nature Mach. Intell, vol. 5, no. 12, pp. 1390–1401, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [95].Koh HY, Nguyen AT, Pan S, May LT, and Webb GI, “Physicochemical graph neural network for learning protein–ligand interaction fingerprints from sequence data,” Nature Mach. Intell, vol. 6, pp. 673–687, 2024. [Google Scholar]
- [96].Lee I, Keum J, and Nam H, “DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences,” PLoS Comput. Biol, vol. 15, no. 6, 2019, Art. no. e1007129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [97].You J, McLeod RD, and Hu P, “Predicting drug-target interaction network using deep learning model,” Comput. Biol. Chem, vol. 80, pp. 90–101, 2019. [DOI] [PubMed] [Google Scholar]
- [98].Li L. and Cai M, “Drug target prediction by multi-view low rank embedding,” IEEE/ACM Trans. Comput. Biol. Bioinf, vol. 16, no. 5, pp. 1712–1721, Sep./Oct. 2019. [DOI] [PubMed] [Google Scholar]
- [99].Wang M. et al. , “Drug-target interaction prediction via dual Laplacian graph regularized matrix completion,” BioMed Res. Int, vol. 2018, 2018, Art. no. 1425608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [100].Ezzat A, Wu M, Li X-L, and Kwoh C-K, “Computational prediction of drug–target interactions using chemogenomic approaches: An empirical survey,” Brief. Bioinf, vol. 20, no. 4, pp. 1337–1357, 2019. [DOI] [PubMed] [Google Scholar]
- [101].Bleakley K. and Yamanishi Y, “Supervised prediction of drug–target interactions using bipartite local models,” Bioinformatics, vol. 25, no. 18, pp. 2397–2403, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [102].Cheng F, Zhou Y, Li W, Liu G, and Tang Y, “Prediction of chemical-protein interactions network with weighted network-based inference method,” PLoS One, vol. 7, no. 7, 2012, Art. no. e41064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [103].Olayan RS, Ashoor H, and Bajic VB, “DDR: Efficient computational method to predict drug–target interactions using graph mining and machine learning approaches,” Bioinformatics, vol. 34, no. 7, pp. 1164–1173, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [104].Thafar MA et al. , “DTiGEMS: Drug-target interaction prediction using graph embedding, graph mining, and similarity-based techniques,” J. Cheminformatics, vol. 12, no. 1, pp. 1–17, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [105].Luo Y. et al. , “A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information,” Nature Commun., vol. 8, no. 1, 2017, Art. no. 573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [106].Mohamed SK, Nováček V, and Nounu A, “Discovering protein drug targets using knowledge graph embeddings,” Bioinformatics, vol. 36, no. 2, pp. 603–610, 2020. [DOI] [PubMed] [Google Scholar]
- [107].Zeng X. et al. , “Network-based prediction of drug–target interactions using an arbitrary-order proximity embedded deep forest,” Bioinformatics, vol. 36, no. 9, pp. 2805–2812, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [108].Alshahrani M, Thafar MA, and Essack M, “Application and evaluation of knowledge graph embeddings in biomedical data,” PeerJ Comput. Sci, vol. 7, 2021, Art. no. e341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [109].Yue Y. and He S, “DTI-HeNE: A novel method for drug-target interaction prediction based on heterogeneous network embedding,” BMC Bioinf., vol. 22, no. 1, pp. 1–20, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [110].Li J, Wang Y, Li Z, Lin H, and Wu B, “LM-DTI: A tool of predicting drug-target interactions using the node2vec and network path score methods,” Front. Genet, vol. 14, 2023, Art. no. 1181592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [111].Zeng X. et al. , “Target identification among known drugs by deep learning from heterogeneous networks,” Chem. Sci, vol. 11, no. 7, pp. 1775–1797, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [112].Yu J-L, Dai Q-Q, and Li G-B, “Deep learning in target prediction and drug repositioning: Recent advances and challenges,” Drug Discov. Today, vol. 27, no. 7, pp. 1796–1814, 2022. [DOI] [PubMed] [Google Scholar]
- [113].de Siqueira Santos S. et al. , “Machine learning and network medicine approaches for drug repositioning for COVID-19,” Patterns, vol. 3, no. 1, 2022, Art. no. 100396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [114].David L, Thakkar A, Mercado R, and Engkvist O, “Molecular representations in AI-driven drug discovery: A review and practical guide,” J. Cheminform, vol. 12, 2020, Art. no. 56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [115].Jin W, Barzilay R, and Jaakkola T, “Junction tree variational autoencoder for molecular graph generation,” in Proc. Int. Conf. Mach. Learn, PMLR, 2018, pp. 2323–2332. [Google Scholar]
- [116].Fang Y. et al. , “Knowledge graph-enhanced molecular contrastive learning with functional prompt,” Nat. Mach. Intell, vol. 5, pp. 542–553, 2023. [Google Scholar]
- [117].Ross J, Belgodere B, Chenthamarakshan V, Padhi I, Mroueh Y, and Das P, “Large-scale chemical language representations capture molecular structure and properties,” Nat. Mach. Intell, vol. 4, pp. 1256–1264, 2022. [Google Scholar]
- [118].Li H, Zhao D, and Zeng J, “KPGT: Knowledge-guided pre-training of graph transformer for molecular property prediction,” in Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining, New York, NY, USA, 2022, pp. 857–867, doi: 10.1145/3534678.3539426. [DOI] [Google Scholar]
- [119].Cadow J, Born J, Manica M, Oskooei A, and Rodríguez Martínez M, “PaccMann: A web service for interpretable anticancer compound sensitivity prediction,” Nucleic Acids Res., vol. 48, no. W1, pp. W502–W508, May 2020, doi: 10.1093/nar/gkaa327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [120].Punnachaiya K, Vateekul P, and Wichadakul D, “Multimodal modules and self-attention for graph neural network molecular properties prediction model,” in Proc. 11th Int. Conf. Bioinf. Comput. Biol, 2023, pp. 141–146. [Google Scholar]
- [121].Wen J. et al. , “Multimodal representation learning for predicting molecule–disease relations,” Bioinformatics, vol. 39, no. 2, Feb. 2023, Art. no. btad085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [122].Wu J, Su Y, Yang A, Ren J, and Xiang Y, “An improved multi-modal representation-learning model based on fusion networks for property prediction in drug discovery,” Comput. Biol. Med, vol. 165, 2023, Art. no. 107452. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0010482523009174 [DOI] [PubMed] [Google Scholar]
- [123].Manica M, Oskooei A, Born J, Subramanian V, Sáez-Rodríguez J, and Rodríguez Martínez M, “Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders,” Mol. Pharm, vol. 16, pp. 4797–4806, 2019. [DOI] [PubMed] [Google Scholar]
- [124].Weber JK et al. , “Simplified, interpretable graph convolutional neural networks for small molecule activity prediction,” J. Comput. Aided Mol. Des, vol. 36, pp. 391–404, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [125].Zeng X. et al. , “Deep generative molecular design reshapes drug discovery,” Cell Rep. Med, vol. 3, no. 12, 2022, Art. no. 100794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [126].Abate C, Decherchi S, and Cavalli A, “Graph neural networks for conditional de novo drug design,” WIREs Comput. Mol. Sci, vol. 13, no. 4, 2023, Art. no. e1651, doi: 10.1002/wcms.1651. [DOI] [Google Scholar]
- [127].Gaudelet T. et al. , “Utilizing graph machine learning within drug discovery and development,” Brief. Bioinf, vol. 22, no. 6, May 2021, Art. no. bbab159, doi: 10.1093/bib/bbab159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [128].Hu W. et al. , “Deep learning methods for small molecule drug discovery: A survey,” IEEE Trans. Artif. Intell, vol. 5, no. 2, pp. 459–479, Feb. 2024. [Google Scholar]
- [129].Zhang Z, Yan J, Liu Q, and Che E, “A systematic survey in geometric deep learning for structure-based drug design,” 2023, arXiv:2306.11768. [Google Scholar]
- [130].Mahmood O, Mansimov E, Bonneau R, and Cho K, “Masked graph modeling for molecule generation,” Nat. Commun, vol. 12, 2021, Art. no. 3156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [131].Segler MHS, Kogej T, Tyrchan C, and Waller MP, “Generating focused molecule libraries for drug discovery with recurrent neural networks,” ACS Central Sci., vol. 4, no. 1, pp. 120–131, 2018, doi: 10.1021/acscentsci.7b00512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [132].Maragakis P, Nisonoff H, Cole B, and Shaw DE, “A deep-learning view of chemical space designed to facilitate drug discovery,” J. Chem. Inf. Model, vol. 60, no. 10, pp. 4487–4496, 2020, doi: 10.1021/acs.jcim.0c00321. [DOI] [PubMed] [Google Scholar]
- [133].Grechishnikova D, “Transformer neural network for protein-specific de novo drug generation as a machine translation problem,” Sci. Rep, vol. 11, 2021, Art. no. 321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [134].Li Y, Zhang L, and Liu Z, “Multi-objective de novo drug design with conditional graph generative model,” J. Cheminform, vol. 10, 2018, Art. no. 33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [135].Zang C. and Wang F, “MoFlow: An invertible flow model for generating molecular graphs,” in Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2020, pp. 617–626. [Google Scholar]
- [136].Kuznetsov M. and Polykovskiy D, “MolGrow: A graph normalizing flow for hierarchical molecular generation,” in Proc. AAAI Conf. Artif. Intell, 2021, pp. 8226–8234. [Google Scholar]
- [137].Liu M, Luo Y, Uchino K, Maruhashi K, and Ji S, “Generating 3D molecules for target protein binding,” in Proc. 39th Int. Conf. Mach. Learn, Baltimore, MD, USA, 2022, pp. 13912–13924. [Google Scholar]
- [138].Simonovsky M. and Komodakis N, “GraphVAE: Towards generation of small graphs using variational autoencoders,” in Proc. Int. Conf. Artif. Neural Netw. Mach. Learn, Krková V, Manolopoulos Y, Hammer B, Iliadis L, and Maglogiannis I, Eds., Cham: Springer International Publishing, 2018, pp. 412–422. [Google Scholar]
- [139].Gómez-Bombarelli R. et al. , “Automatic chemical design using a data-driven continuous representation of molecules,” ACS Central Sci., vol. 4, no. 2, pp. 268–276, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [140].Griffiths R-R and Hernández-Lobato JM, “Constrained Bayesian optimization for automatic chemical design using variational autoencoders,” Chem. Sci, vol. 11, pp. 577–586, 2020, doi: 10.1039/C9SC04026A. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [141].Dollar O, Joshi N, Beck DAC, and Pfaendtner J, “Attention-based generative models for de novo molecular design,” Chem. Sci, vol. 12, pp. 8362–8372, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [142].Adams K. and Coley CW, “Equivariant shape-conditioned generation of 3D molecules for ligand-based drug design,” in Proc. Int. Conf. Learn. Representations, 2022. [Google Scholar]
- [143].Wang M. et al. , “RELATION: A deep generative model for structure-based de novo drug design,” J. Med. Chem, vol. 65, no. 13, pp. 9478–9492, 2022, doi: 10.1021/acs.jmedchem.2c00732. [DOI] [PubMed] [Google Scholar]
- [144].De Cao N. and Kipf T, “MolGAN: An implicit generative model for small molecular graphs,” in Proc. ICML 2018 Workshop Theor. Found. Appl. Deep Generative Models, 2018. [Google Scholar]
- [145].Ma L, Pocha A, Kaczmarczyk J, Rataj K, Danel T, and Warchol M, “Mol-CycleGAN: A generative model for molecular optimization,” J. Cheminformatics, vol. 12, 2020, Art. no. 2, doi: 10.1186/s13321-019-0404-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [146].Bai Q, Tan S, Xu T, Liu H, Huang J, and Yao X, “MolAICal: A soft tool for 3D drug design of protein targets by artificial intelligence and classical algorithm,” Brief. Bioinf, vol. 22, no. 3, Aug. 2020, Art. no. bbaa161, doi: 10.1093/bib/bbaa161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [147].Xu M, Yu L, Song Y, Shi C, Ermon S, and Tang J, “GeoDiff: A geometric diffusion model for molecular conformation generation,” in Proc. Int. Conf. Learn. Representations, 2022. [Online]. Available: https://openreview.net/forum?id=PzcvxEMzvQC [Google Scholar]
- [148].Jo J, Lee S, and Hwang SJ, “Score-based generative modeling of graphs via the system of stochastic differential equations,” in Proc. 39th Int. Conf. Mach. Learn, 2022, pp. 10362–10383. [Online]. Available: https://arxiv.org/abs/2202.02514 [Google Scholar]
- [149].Vignac C, Krawczuk I, Siraudin A, Wang B, Cevher V, and Frossard P, “DiGress: Discrete denoising diffusion for graph generation,” 2022, arXiv:2209.14734. [Online]. Available: https://api.semanticscholar.org/CorpusID:252595881 [Google Scholar]
- [150].Huang H, Sun L, Du B, and Lv W, “Learning joint 2-D and 3-D graph diffusion models for complete molecule generation,” IEEE Trans. Neural Netw. Learn. Syst, vol. 35, no. 9, pp. 11857–11871, Sep. 2024. [DOI] [PubMed] [Google Scholar]
- [151].Guan J. et al. , “DecompDiff: Diffusion models with decomposed priors for structure-based drug design,” in Proc. 40th Int. Conf. Mach. Learn, Krause A, Brunskill E, Cho K, Engelhardt B, Sabato S, and Scarlett J, Eds., PMLR, 2023, pp. 11827–11846. [Online]. Available: https://proceedings.mlr.press/v202/guan23a.html [Google Scholar]
- [152].Huang L, Zhang H, Xu T, and Wong K-C, “MDM: Molecular diffusion model for 3D molecule generation,” in Proc. 37th AAAI Conf. Artif. Intell. 35th Conf. Innov. Appl. Artif. Intell. 13th Symp. Educ. Adv. Artif. Intell, AAAI Press, 2023, Art. no. 570. [Google Scholar]
- [153].Hoogeboom E, Satorras VG, Vignac C, and Welling M, “Equivariant diffusion for molecule generation in 3D,” in Proc. 39th Int. Conf. Mach. Learn, PMLR, 2022, pp. 8867–8887. [Google Scholar]
- [154].Morehead A. and Cheng J, “Geometry-complete diffusion for 3d molecule generation and optimization,” Commun. Chem, vol. 7, 2024, Art. no. 150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [155].Guan J, Qian WW, Peng X, Su Y, Peng J, and Ma J, “3D equivariant diffusion for target-aware molecule generation and affinity prediction,” in Proc. Int. Conf. Learn. Representations, 2023. [Google Scholar]
- [156].Zhavoronkov A. et al. , “Deep learning enables rapid identification of potent DDR1 kinase inhibitors,” Nat. Biotechnol, vol. 37, pp. 1038–1040, 2019. [DOI] [PubMed] [Google Scholar]
- [157].Jin W, Barzilay R, and Jaakkola T, “Multi-objective molecule generation using interpretable substructures,” in Proc. Int. Conf. Mach. Learn, 2020, pp. 4849–4859. [Google Scholar]
- [158].Wang J. et al. , “Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning,” Nat. Mach. Intell, vol. 3, pp. 914–922, 2021. [Google Scholar]
- [159].Zhang Y, Li S, Xing M, Yuan Q, He H, and Sun S, “Universal approach to de novo drug design for target proteins using deep reinforcement learning,” ACS Omega, vol. 8, no. 6, pp. 5464–5474, 2023, doi: 10.1021/acsomega.2c06653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [160].Simm GNC, Pinsler R, Csányi G, and Hernández-Lobato JM, “Symmetry-aware actor-critic for 3D molecular design,” in Proc. Int. Conf. Learn. Representations, 2021. [Online]. Available: https://openreview.net/forum?id=jEYKjPE1xYN [Google Scholar]
- [161].Higgins I. et al. , “beta-VAE: Learning basic visual concepts with a constrained variational framework,” in Proc. 5th Int. Conf. Learn. Representations, Toulon, France, 2017. [Google Scholar]
- [162].Zhang M. et al. , “A survey on graph diffusion models: Generative AI in science for molecule, protein and material,” 2023, arXiv:2304.01565, doi: 10.13140/RG.2.2.26493.64480. [DOI] [Google Scholar]
- [163].Alakhdar A, Poczos B, and Washburn N, “Diffusion models in DeNovo drug design,” 2024. [Online]. Available: https://arxiv.org/abs/2406.08511 [DOI] [PMC free article] [PubMed] [Google Scholar]
- [164].Montastruc J-L et al. , “What is pharmacoepidemiology? Definition, methods, interest and clinical applications,” Therapies, vol. 74, no. 2, pp. 169–174, 2019. [DOI] [PubMed] [Google Scholar]
- [165].Kostis JB and Dobrzynski JM, “Limitations of randomized clinical trials,” Amer. J. Cardiol, vol. 129, pp. 109–115, 2020. [DOI] [PubMed] [Google Scholar]
- [166].Zhu X, Hu J, Xiao T, Huang S, Shang D, and Wen Y, “Integrating machine learning with electronic health record data to facilitate detection of prolactin level and pharmacovigilance signals in olanzapine-treated patients,” Front. Endocrinol, vol. 13, 2022, Art. no. 1011492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [167].Martin GL et al. , “Validation of artificial intelligence to support the automatic coding of patient adverse drug reaction reports, using nationwide pharmacovigilance data,” Drug Saf., vol. 45, no. 5, pp. 535–548, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [168].Gosselt HR, Bazelmans EA, Lieber T, van Hunsel FP, and Härmark L, “Development of a multivariate prediction model to identify individual case safety reports which require clinical review,” Pharmacoepidemiol. Drug Saf, vol. 31, no. 12, pp. 1300–1307, 2022. [DOI] [PubMed] [Google Scholar]
- [169].Imran M. et al. , “Supervised machine learning-based decision support for signal validation classification,” Drug Saf., vol. 45, no. 5, pp. 583–596, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [170].Lee J-E, Kim JH, Bae J-H, Song I, and Shin J-Y, “Detecting early safety signals of infliximab using machine learning algorithms in the korea adverse event reporting system,” Sci. Rep, vol. 12, no. 1, 2022, Art. no. 14869. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [171].Nghiem TT and Bousquet C, “Named entity recognition in pubmed abstracts for pharmacovigilance using deep learning,” Stud. Health Technol. Inform, vol. 294, pp. 878–879, 2022. [DOI] [PubMed] [Google Scholar]
- [172].Alroobaea R. et al. , “IL-4/13 blockade and sleep-related adverse drug reactions in over 37,000 dupilumab reports from the world health organization individual case safety reporting pharmacovigilance database (VigiBaseTM) : A Big Data and machine learning analysis,” Eur. Rev. Med. Pharmacological Sci, vol. 26, no. 11, pp. 4074–4081, 2022. [DOI] [PubMed] [Google Scholar]
- [173].Huang J-Y, Lee W-P, and Lee K-D, “Predicting adverse drug reactions from social media posts: Data balance, feature selection and deep learning,” Healthcare, vol. 10, no. 4, pp. 618, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [174].McMaster C. et al. , “Developing a deep learning natural language processing algorithm for automated reporting of adverse drug reactions,” J. Biomed. Inform, vol. 137, 2023, Art. no. 104265. [DOI] [PubMed] [Google Scholar]
- [175].Kaas-Hansen BS et al. , “Language-agnostic pharmacovigilant text mining to elicit side effects from clinical notes and hospital medication records,” Basic Clin. Pharmacol. Toxicol, vol. 131, no. 4, pp. 282–293, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [176].Yu D. and Vydiswaran VV, “An assessment of mentions of adverse drug events on social media with natural language processing: Model development and analysis,” JMIR Med. Inform, vol. 10, no. 9, 2022, Art. no. e38140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [177].Tan HX et al. , “Combining machine learning with a rule-based algorithm to detect and identify related entities of documented adverse drug reactions on hospital discharge summaries,” Drug Saf., vol. 45, no. 8, pp. 853–862, 2022. [DOI] [PubMed] [Google Scholar]
- [178].Dirkson A, Verberne S, Kraaij W, van Oortmerssen G, and Gelderblom H, “Automated gathering of real-world data from online patient forums can complement pharmacovigilance for rare cancers,” Sci. Rep, vol. 12, no. 1, 2022, Art. no. 10317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [179].Ponthier L. et al. , “Optimization of vancomycin initial dose in term and preterm neonates by machine learning,” Pharmaceut. Res, vol. 39, no. 10, pp. 2497–2506, 2022. [DOI] [PubMed] [Google Scholar]
- [180].Huang X. et al. , “Prediction of vancomycin dose on high-dimensional data using machine learning techniques,” Expert Rev. Clin. Pharmacol, vol. 14, no. 6, pp. 761–771, 2021. [DOI] [PubMed] [Google Scholar]
- [181].Hughes JH and Keizer RJ, “A hybrid machine learning/pharmacokinetic approach outperforms maximum a posteriori Bayesian estimation by selectively flattening model priors,” CPT: Pharmacometrics Syst. Pharmacol, vol. 10, no. 10, pp. 1150–1160, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [182].Chi C-L et al. , “Producing personalized statin treatment plans to optimize clinical outcomes using Big Data and machine learning,” J. Biomed. Inform, vol. 128, 2022, Art. no. 104029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [183].Barrio M, Raeburn CD, McIntyre R, Albuja-Cruz M, Haugen BR, and Pozdeyev N, “Computer-assisted levothyroxine dose selection for the treatment of postoperative hypothyroidism,” Thyroid, vol. 33, no. 5, pp. 547–555, 2023. [DOI] [PubMed] [Google Scholar]
- [184].Mizuno K. et al. , “Model-informed precision dosing guidance of ethosuximide developed from a randomized controlled clinical trial of childhood absence epilepsy,” Clin. Pharmacol. Therapeutics, vol. 114, pp. 459–469, 2023. [DOI] [PubMed] [Google Scholar]
- [185].Ponthier L. et al. , “Application of machine learning to predict tacrolimus exposure in liver and kidney transplant patients given the meltdose formulation,” Eur. J. Clin. Pharmacol, vol. 79, no. 2, pp. 311–319, 2023. [DOI] [PubMed] [Google Scholar]
- [186].Woillard J-B, Labriffe M, Prémaud A, and Marquet P, “Estimation of drug exposure by machine learning based on simulations from published pharmacokinetic models: The example of tacrolimus,” Pharmacological Res., vol. 167, 2021, Art. no. 105578. [DOI] [PubMed] [Google Scholar]
- [187].Destere A, Marquet P, Labriffe M, Drici M-D, and Woillard J-B, “A hybrid algorithm combining population pharmacokinetic and machine learning for isavuconazole exposure prediction,” Pharmaceut. Res, vol. 40, no. 4, pp. 951–959, 2023. [DOI] [PubMed] [Google Scholar]
- [188].Keutzer L. et al. , “Machine learning and pharmacometrics for prediction of pharmacokinetic data: Differences, similarities and challenges illustrated with rifampicin,” Pharmaceutics, vol. 14, no. 8, pp. 1530, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [189].Zhu X, Zhang M, Wen Y, and Shang D, “Machine learning advances the integration of covariates in population pharmacokinetic models: Valproic acid as an example,” Front. Pharmacol, vol. 13, 2022, Art. no. 994665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [190].Li Z-R et al. , “Population pharmacokinetic modeling combined with machine learning approach improved tacrolimus trough concentration prediction in chinese adult liver transplant recipients,” J. Clin. Pharmacol, vol. 63, no. 3, pp. 314–325, 2023. [DOI] [PubMed] [Google Scholar]
- [191].Lee S, Song M, Han J, Lee D, and Kim B-H, “Application of machine learning classification to improve the performance of vancomycin therapeutic drug monitoring,” Pharmaceutics, vol. 14, no. 5, pp. 1023, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [192].Bououda M. et al. , “A machine learning approach to predict interdose vancomycin exposure,” Pharmaceut. Res, vol. 39, no. 4, pp. 721–731, 2022. [DOI] [PubMed] [Google Scholar]
- [193].Sibieude E, Khandelwal A, Hesthaven JS, Girard P, and Terranova N, “Fast screening of covariates in population models empowered by machine learning,” J. Pharmacokinetics Pharmacodynamics, vol. 48, no. 4, pp. 597–609, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [194].Khusial R, Bies RR, and Akil A, “Deep learning methods applied to drug concentration prediction of olanzapine,” Pharmaceutics, vol. 15, no. 4, pp. 1139, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [195].Jaber MM, Yaman B, Sarafoglou K, and Brundage RC, “Application of deep neural networks as a prescreening tool to assign individualized absorption models in pharmacokinetic analysis,” Pharmaceutics, vol. 13, no. 6, pp. 797, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [196].Nigo M. et al. , “PK-RNN-V E: A deep learning model approach to vancomycin therapeutic drug monitoring using electronic health record data,” J. Biomed. Inform, vol. 133, 2022, Art. no. 104166. [DOI] [PubMed] [Google Scholar]
- [197].Destere A. et al. , “A hybrid model associating population pharmacokinetics with machine learning: A case study with iohexol clearance estimation,” Clin. Pharmacokinetics, vol. 61, no. 8, pp. 1157–1165, 2022. [DOI] [PubMed] [Google Scholar]
- [198].Huang Q. et al. , “Tacrolimus pharmacokinetics in pediatric nephrotic syndrome: A combination of population pharmacokinetic modelling and machine learning approaches to improve individual prediction,” Front. Pharmacol, vol. 13, 2022, Art. no. 942129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [199].Damnjanović I, Tsyplakova N, Stefanović N, Tošić T, Catićorević A, and Karalis V, “Joint use of population pharmacokinetics and machine learning for optimizing antiepileptic treatment in pediatric population,” Therapeutic Adv. Drug Saf, vol. 14, 2023, Art. no. 20420986231181337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [200].Verhaeghe J. et al. , “Development and evaluation of uncertainty quantifying machine learning models to predict piperacillin plasma concentrations in critically ill patients,” BMC Med. Inform. Decis. Mak, vol. 22, no. 1, 2022, Art. no. 224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [201].Takahashi S. et al. , “Classification tree analysis based on machine learning for predicting linezolid-induced thrombocytopenia,” J. Pharmaceut. Sci, vol. 110, no. 5, pp. 2295–2300, 2021. [DOI] [PubMed] [Google Scholar]
- [202].Pirmohamed M, Breckenridge AM, Kitteringham NR, and Park BK, “Adverse drug reactions,” Bmj, vol. 316, no. 7140, pp. 1295–1298, 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [203].Bate A. and Stegmann J-U, “Artificial intelligence and pharmacovigilance: What is happening, what could happen and what should happen,” Health Policy Technol., vol. 12, no. 2, 2023, Art. no. 100743. [Google Scholar]
- [204].Lee CY and Chen Y-PP, “Machine learning on adverse drug reactions for pharmacovigilance,” Drug Discov. Today, vol. 24, no. 7, pp. 1332–1343, 2019. [DOI] [PubMed] [Google Scholar]
- [205].Salas M. et al. , “The use of artificial intelligence in pharmacovigilance: A systematic review of the literature,” Pharmaceut. Med, vol. 36, no. 5, pp. 295–306, 2022. [DOI] [PubMed] [Google Scholar]
- [206].Saleh HA, “Machine learning applications in pharmacovigilance: Scoping review,” Pharmacovigilance, vol. 2, 2023. [Google Scholar]
- [207].Kim HR et al. , “Analyzing adverse drug reaction using statistical and machine learning methods: A systematic review,” Medicine, vol. 101, no. 25, 2022, Art. no. e29387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [208].Pilipiec P, Liwicki M, and Bota A, “Using machine learning for pharmacovigilance: A systematic review,” Pharmaceutics, vol. 14, no. 2, pp. 266, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [209].Lian AT, Du J, and Tang L, “Using a machine learning approach to monitor COVID-19 vaccine adverse events (VAE) from twitter data,” Vaccines, vol. 10, no. 1, pp. 103, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [210].Portelli B, Scaboro S, Tonino R, Chersoni E, Santus E, and Serra G, “Monitoring user opinions and side effects on COVID-19 vaccines in the twittersphere: Infodemiology study of tweets,” J. Med. Internet Res, vol. 24, no. 5, 2022, Art. no. e35115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [211].Martenot V. et al. , “LiSA: An assisted literature search pipeline for detecting serious adverse drug events with deep learning,” BMC Med. Inform. Decis. Mak, vol. 22, no. 1, pp. 1–16, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [212].Kaas-Hansen BS, Gentile S, Caioli A, and Andersen SE, “Exploratory pharmacovigilance with machine learning in big patient data: A focused scoping review,” Basic Clin. Pharmacol. Toxicol, vol. 132, no. 3, pp. 233–241, 2023. [DOI] [PubMed] [Google Scholar]
- [213].Murphy RM et al. , “Adverse drug event detection using natural language processing: A scoping review of supervised learning methods,” PLoS One, vol. 18, no. 1, 2023, Art. no. e0279842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [214].Jeon E, Kim Y, Park H, Park RW, Shin H, and Park H-A, “Analysis of adverse drug reactions identified in nursing notes using reinforcement learning,” Healthcare Inform. Res, vol. 26, no. 2, pp. 104–111, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [215].Kreimeyer K. et al. , “Increased confidence in deduplication of drug safety reports with natural language processing of narratives at the us food and drug administration,” Front. Drug Saf. Regulation, vol. 2, 2022, Art. no. 918897. [Google Scholar]
- [216].Kreimeyer K. et al. , “Using probabilistic record linkage of structured and unstructured data to identify duplicate cases in spontaneous adverse event reporting systems,” Drug Saf., vol. 40, pp. 571–582, 2017. [DOI] [PubMed] [Google Scholar]
- [217].Kreimeyer K. et al. , “Feature engineering and machine learning for causality assessment in pharmacovigilance: Lessons learned from application to the FDA adverse event reporting system,” Comput. Biol. Med, vol. 135, 2021, Art. no. 104517. [DOI] [PubMed] [Google Scholar]
- [218].Cherkas Y, Ide J, and van Stekelenborg J, “Leveraging machine learning to facilitate individual case causality assessment of adverse drug reactions,” Drug Saf., vol. 45, no. 5, pp. 571–582, 2022. [DOI] [PubMed] [Google Scholar]
- [219].Pérez-Blanco JS and Lanao JM, “Model-informed precision dosing (MIPD),” Pharmaceutics, vol. 14, pp. 2731, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [220].Heus A. et al. , “Model-informed precision dosing of vancomycin via continuous infusion: A clinical fit-for-purpose evaluation of published PK models,” Int. J. Antimicrobial Agents, vol. 59, no. 5, 2022, Art. no. 106579. [DOI] [PubMed] [Google Scholar]
- [221].Darwich AS et al. , “Model-informed precision dosing: Background, requirements, validation, implementation, and forward trajectory of individualizing drug therapy,” Annu. Rev. Pharmacol. Toxicol, vol. 61, pp. 225–245, 2021. [DOI] [PubMed] [Google Scholar]
- [222].Del Valle-Moreno P. et al. , “Model-informed precision dosing software tools for dosage regimen individualization: A scoping review,” Pharmaceutics, vol. 15, no. 7, pp. 1859, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [223].Mizuno T, Dong M, Taylor ZL, Ramsey LB, and Vinks AA, “Clinical implementation of pharmacogenetics and model-informed precision dosing to improve patient care,” Brit. J. Clin. Pharmacol, vol. 88, no. 4, pp. 1418–1426, 2022. [DOI] [PubMed] [Google Scholar]
- [224].Shen G. et al. , “Precision sirolimus dosing in children: The potential for model-informed dosing and novel drug monitoring,” Front. Pharmacol, vol. 14, 2023, Art. no. 1126981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [225].Zwart TC et al. , “Model-informed precision dosing to optimise immunosuppressive therapy in renal transplantation,” Drug Discov. Today, vol. 26, no. 11, pp. 2527–2546, 2021. [DOI] [PubMed] [Google Scholar]
- [226].Faelens R, Luyckx N, Kuypers D, Bouillon T, and Annaert P, “Predicting model-informed precision dosing: A test-case in tacrolimus dose adaptation for kidney transplant recipients,” CPT: Pharmacometrics Syst. Pharmacol, vol. 11, no. 3, pp. 348–361, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [227].Mockeliunas L. et al. , “Model-informed precision dosing of linezolid in patients with drug-resistant tuberculosis,” Pharmaceutics, vol. 14, no. 4, pp. 753, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [228].Wilkins JJ, Svensson EM, Ernest JP, Savic RM, Simonsson US, and McIlleron H, “Pharmacometrics in tuberculosis: Progress and opportunities,” Int. J. Antimicrobial Agents, vol. 60, no. 3, 2022, Art. no. 106620. [DOI] [PubMed] [Google Scholar]
- [229].Velarde-Salcedo R. et al. , “Model-informed precision dosing of antimicrobial drugs in pediatrics: Experiences from a pilot scale program,” Eur. J. Pediatrics, vol. 182, pp. 4143–4152, 2023. [DOI] [PubMed] [Google Scholar]
- [230].Oda K, Jono H, and Saito H, “Model-informed precision dosing of vancomycin in adult patients undergoing hemodialysis,” Antimicrobial Agents Chemotherapy, vol. 67, no. 6, 2023, Art. no. e00089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [231].Wicha SG et al. , “From therapeutic drug monitoring to model-informed precision dosing for antibiotics,” Clin. Pharmacol. Therapeutics, vol. 109, no. 4, pp. 928–941, 2021. [DOI] [PubMed] [Google Scholar]
- [232].Abdulla A. et al. , “Model-informed precision dosing of antibiotics in pediatric patients: A narrative review,” Front. Pediatrics, vol. 9, 2021, Art. no. 624639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [233].Liu L, Wang J, Zhang H, Chen M, and Cai Y, “Model-informed precision dosing of antibiotics in osteoarticular infections,” Infection Drug Resistance, vol. 15, pp. 99–110, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [234].Dong M. et al. , “Model-informed precision dosing for alemtuzumab in paediatric and young adult patients undergoing allogeneic haematopoietic cell transplantation,” Brit. J. Clin. Pharmacol, vol. 88, no. 1, pp. 248–259, 2022. [DOI] [PubMed] [Google Scholar]
- [235].Ewoldt TM et al. , “Model-informed precision dosing of beta-lactam antibiotics and ciprofloxacin in critically ill patients: A multicentre randomised clinical trial,” Intensive Care Med., vol. 48, no. 12, pp. 1760–1771, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [236].Nwanosike EM, Sunter W, Merchant HA, Conway BR, Ansari MA, and Hasan SS, “Challenges and possible solutions to direct-acting oral anticoagulants (DOACs) dosing in patients with extreme bodyweight and renal impairment,” Amer. J. Cardiovasc. Drugs, vol. 23, no. 1, pp. 9–17, 2023. [DOI] [PubMed] [Google Scholar]
- [237].Oni-Orisan A. et al. , “Leveraging innovative technology to generate drug response phenotypes for the advancement of biomarker-driven precision dosing,” Clin. Transl. Sci, vol. 14, no. 3, pp. 784–790, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [238].van Gelder T. and Vinks AA, “Machine learning as a novel method to support therapeutic drug management and precision dosing,” Clin. Pharmacol. Ther, vol. 110, pp. 273–276, 2021. [DOI] [PubMed] [Google Scholar]
- [239].Poweleit EA, Vinks AA, and Mizuno T, “Artificial intelligence and machine learning approaches to facilitate therapeutic drug management and model-informed precision dosing,” Therapeutic Drug Monit., vol. 45, no. 2, pp. 143–150, 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [240].Ramesh S. et al. , “Applications of artificial intelligence in pediatric oncology: A systematic review,” JCO Clin. Cancer Inform, vol. 5, pp. 1208–1219, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [241].Ribba B, Dudal S, Lavé T, and Peck RW, “Model-informed artificial intelligence: Reinforcement learning for precision dosing,” Clin. Pharmacol. Therapeutics, vol. 107, no. 4, pp. 853–857, 2020. [DOI] [PubMed] [Google Scholar]
- [242].Primas C, Reinisch W, Panetta JC, Eser A, Mould DR, and Dervieux T, “Model informed precision dosing tool forecasts trough infliximab and associates with disease status and tumor necrosis factor-alpha levels of inflammatory bowel diseases,” J. Clin. Med, vol. 11, no. 12, pp. 3316, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [243].Bonate PL et al. , “Training the next generation of pharmacometric modelers: A multisector perspective,” J. Pharmacokinetics Pharmacodynamics, vol. 51, pp. 5–31, 2024. [DOI] [PubMed] [Google Scholar]
- [244].Balch JA et al. , “Machine learning applications in solid organ transplantation and related complications,” Front. Immunol, vol. 12, 2021, Art. no. 739728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [245].Gill J. et al. , “Comparing the applications of machine learning, PBPK, and population pharmacokinetic models in pharmacokinetic drug–drug interaction prediction,” CPT: Pharmacometrics Syst. Pharmacol, vol. 11, no. 12, pp. 1560–1568, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [246].Zheng P. et al. , “Pharmaceutical care model in precision medicine in China,” Farmacia Hospitalaria, vol. 47, pp. 218–223, 2023. [DOI] [PubMed] [Google Scholar]
- [247].Triggle DJ and Taylor JB, Comprehensive Medicinal Chemistry II, vol. 8. Amsterdam, Netherlands: Elsevier, 2006. [Google Scholar]
- [248].Kapralos I. and Dokoumetzidis A, “Population pharmacokinetic modelling of the complex release kinetics of octreotide LAR: Defining sub-populations by cluster analysis,” Pharmaceutics, vol. 13, no. 10, pp. 1578, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [249].Nair R, Mohan DD, Setlur S, Govindaraju V, and Ramanathan M, “Generative models for age, race/ethnicity, and disease state dependence of physiological determinants of drug dosing,” J. Pharmacokinetics Pharmacodynamics, vol. 50, no. 2, pp. 111–122, 2023. [DOI] [PubMed] [Google Scholar]
- [250].Sen Paul R, “S.5002 - FDA modernization act 2.0,” 2021. [Online]. Available: https://www.congress.gov/bill/117th-congress/senate-bill/5002/summary/00 [Google Scholar]
- [251].Zhou Y. et al. , “Therapeutic target database update 2022: Facilitating drug discovery with enriched comparative data of targeted agents,” Nucleic Acids Res, vol. 50, no. D1, pp. D1398–D1407, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [252].Kana O. and Brylinski M, “Elucidating the druggability of the human proteome with e findsite,” J. Comput.-Aided Mol. Des, vol. 33, pp. 509–519, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [253].Finan C. et al. , “The druggable genome and support for target identification and validation in drug development,” Sci. Transl. Med, vol. 9, no. 383, 2017, Art. no. eaag1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [254].Lim S, “The process and costs of drug development (2022),” 2018. [Online]. Available: https://ftloscience.com/process-costs-drug-development/ [Google Scholar]
- [255].Sun D, Gao W, Hu H, and Zhou S, “Why 90% of clinical drug development fails and how to improve it,” Acta Pharm. Sinica B, vol. 12, no. 7, pp. 3049–3062, 2022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [256].J. H. University, “Omim*****an online catalog of human genes and genetic disorders,” 1966. [Online]. Available: https://www.omim.org/ [Google Scholar]
- [257].Rolland T. et al. , “A proteome-scale map of the human interactome network,” Cell, vol. 159, no. 5, pp. 1212–1226, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [258].Menche J. et al. , “Uncovering disease-disease relationships through the incomplete interactome,” Science, vol. 347, no. 6224, 2015, Art. no. 1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [259].Woodsmith J. and Stelzl U, “Studying post-translational modifications with protein interaction networks,” Curr. Opin. Struct. Biol, vol. 24, pp. 34–44, 2014. [DOI] [PubMed] [Google Scholar]
- [260].Loscalzo J, Network Medicine. Cambridge, MA, USA: Harvard Univ. Press, 2017. [Google Scholar]
- [261].Gao W. and Coley CW, “The synthesizability of molecules proposed by generative models,” J. Chem. Inf. Model, vol. 60, no. 12, pp. 5714–5723, 2020, doi: 10.1021/acs.jcim.0c00174. [DOI] [PubMed] [Google Scholar]
- [262].Choi L. et al. , “Development of a system for postmarketing population pharmacokinetic and pharmacodynamic studies using real-world data from electronic health records,” Clin. Pharmacol. Therapeutics, vol. 107, no. 4, pp. 934–943, 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]

