AlvaBuilder: A Software for De Novo Molecular Design

Andrea Mauri; Matteo Bertola

doi:10.1021/acs.jcim.3c00610

. 2023 Jul 3;64(7):2136–2142. doi: 10.1021/acs.jcim.3c00610

AlvaBuilder: A Software for De Novo Molecular Design

Andrea Mauri ^†,^*, Matteo Bertola ^†

PMCID: PMC11005826 PMID: 37399048

Abstract

AlvaBuilder is a software tool for de novo molecular design and can be used to generate novel molecules having desirable characteristics. Such characteristics can be defined using a simple step by step graphical interface, and they can be based on molecular descriptors, on predictions of QSAR/QSPR models, and on matching molecular fragments or used to design compounds similar to a given one. The molecules generated are always syntactically valid since they are composed by combining fragments of molecules taken from a training data set chosen by the user. In this paper, we demonstrate how the software can be used to design new compounds for a defined case study. AlvaBuilder is available at https://www.alvascience.com/alvabuilder/

Introduction

Drug discovery and materials science rely significantly on the identification of novel molecules that possess desirable characteristics. However, the vast number of potential molecules in existence makes this a daunting task. For example, the number of potential drug-like compounds has been estimated to be between 10²³ and 10⁶⁰,^1,2 whereas the number of synthesized compounds is in the order of 10⁸.³ The sheer size of the chemical space and the discrete nature of molecular properties make it challenging to identify new molecules with application-driven goals.

Despite the challenges, identifying new molecules is crucial for the development of novel medicines and materials. In the drug discovery world, the process of identifying new drugs has become slower and more expensive over time. This phenomenon has been described by Eroom’s law,⁴ which is essentially Moore’s law spelled backward.⁵ Moore’s law states that the number of transistors in a circuit doubles every two years, making computers faster over time. In contrast, Eroom’s law describes the observation that the cost for new drugs has doubled every nine years since the 1980s.

To accelerate the process of identifying new molecules, significant advances have been made in exploring different generative models for molecule generation.^6,7 This field, known as de novo molecular design, is becoming more standardized by identifying the key aspects that the generative systems and designed molecules should have, such as validity, uniqueness, and novelty.⁸ A molecule is considered valid if it is syntactically correct and its atoms have appropriate valence. The uniqueness refers to the ability of a system to generate molecules that are unique; i.e., it should not generate the same molecule more than once. Novelty is the desirable goal of de novo molecular design systems to come up with unseen molecules that, for example, are not part of the training set.

To tackle this challenging task, Alvascience developed alvaBuilder a desktop software for de novo molecular design. AlvaBuilder has the versatility of integrating the generation of new molecules with its ability to calculate molecular descriptors^9,10 and make predictions defined in quantitative structure–activity relationship (QSAR) and quantitative structure–property relationship (QSPR) models.¹¹ In fact, alvaBuilder can constrain the chemical space explored during the design process by defining which properties the new molecules should have. The properties can be defined by using a user-friendly graphic interface, for example, by specifying which physicochemical properties to use from a wide range of available molecular descriptors. AlvaBuilder is part of the Alvascience software suite which comprises other tools (e.g., alvaDesc), but it can be used autonomously as a stand-alone software.

Overview of AlvaBuilder Functionalities

AlvaBuilder is a software tool for de novo molecular design. By combining Genetic Algorithms on molecular graphs, it can design new molecules with desired characteristics. The first version of alvaBuilder was released in 2020 and is now used by universities and commercial companies.

The basic functioning of alvaBuilder is shown in Figure 1. AlvaBuilder was designed from the ground up to take advantage of all the possibilities of de novo molecular design while maintaining a simple workflow and easy to use graphical interface (GUI). In fact, its input has been limited to only two mandatory components: a training data set and a set of rules. The training data set contains the molecular compounds that will be used as a reference to design the resulting molecules. The set of rules describes the characteristics that the new molecules need to have. Such rules can either be defined using a simple step-by-step graphical user interface (Figure 2) or can be opened by a previously saved file containing them called the score file.

Workflow of alvaBuilder showing the Genetic Algorithms phases: Initialization, Reproduction, Mutation, and Evaluation. On the left, the two input elements: a training set of molecules and a score based on a set of rules. The four possible rule types are identified by a letter: D for rules based on descriptors, Q for rules based on QSAR/QSPR models, S for rules based on molecular similarity, and M for rules based on fragment matching.

Graphical user interface of alvaBuilder showing the step-by-step procedure to define the score rules.

The idea behind alvaBuilder is to simplify the management of the conflicting needs that underlie de novo molecular design: finding new interesting molecules in a huge chemical space while limiting this search in time. The approach chosen implies that alvaBuilder identifies and improves, over a set of iterations, a population of molecules and not just one single molecule. So that, when the iterations are all done, an expert, e.g., a chemist, can analyze the population and decide which molecules to keep for further investigations. With this approach,¹² the software is in charge of coming up with ideas by exploring the molecular space while the human expert first constrains such exploration by defining a list of wanted characteristics and providing the building blocks (training data set) that the software can use and finally evaluates the results to decide how to move forward.

Molecular Representation

The generative process comprises several steps where molecules, either from the training data set or belonging to the population, are manipulated. This is done by performing transformations on a molecule such as splitting it in two, selecting a portion of it, substituting a fragment with another, and so on. Therefore, to apply all the needed operations, a specific flexible molecular representation was developed. Each molecule is represented internally as a tree structure where the component items are fragments of the molecule (Figure 3). In fact, each molecule selected from the training data set is initially split into fragments following the rules described by Bemis and Murcko¹³ to obtain three fragment types: Ring systems, Linkers, and Side chains. A ring system is a lonely cycle or a set of cycles sharing one or more edges. Linkers include those atoms that are on the direct path connecting two ring systems, and side chains include any nonring and nonlinker atoms. Since the fragments used as the basic ingredients of the molecules of the population are taken from the training data set, there is no risk that the final molecules may contain unwanted elements such as unknown ring systems.

A representation of valsartan as a molecular tree structure composed by the fragments of the molecule.

Genetic Algorithms

Such molecular representation is the stepping stone on which the alvaBuilder Genetic Algorithms (GA) are built. The GA is a technique that, taking inspiration from Darwinian theory, improves the members of a population by mutating and recombining their genes.^14,15 In the context of alvaBuilder, the population is composed by molecules represented as a tree of fragments. The phases of the GA are Initialization, Reproduction, Mutation, and Evaluation. At the beginning (Initialization), a number of molecules corresponding to the Population size are selected at random from the training data set to compose the starting population. In the Reproduction phase, a new population is created by combining pairs of members of the existing population. In particular, the population is randomly grouped into pairs. Each molecule of the couple is split in two at a random point of the tree of fragments. Finally, by applying a genetic operation called crossover, a new offspring is generated from the pair of molecular parents. Following that, the new population goes through the Mutation phase where a percentage of its molecules is altered. When a molecule is randomly selected to be mutated, one of its fragments is selected, and a mutation operation is applied on it. The mutation operation can, for example, change the fragment with another selected by chance from the training data set or remove it from the fragment tree by turning it into a hydrogen atom. In the final Evaluation phase, each member of the new population is evaluated using the defined score function. The molecules of the two populations are sorted together by score, and only the best molecules are preserved. The resulting population will be used in the next iteration of the GA. The GA can be manually terminated by the user or can naturally finish either when the Maximum number of iterations is reached or when the Goal score is achieved by at least one of the molecules. Increasing the number of iterations increases the chances for the GA to find molecules that satisfy the score function but likewise increases the execution time. These parameters are optional and can be left to their default values or changed using the alvaBuilder GUI.

Score Function

A key element of the GA is the ability to compare two molecules and determine which one is better than the other. Such ability is based on the score function which is automatically composed by the software upon the rules defined by the user. Each score rule can assume a value between 0 and 1, where the latter means that the rule is completely satisfied. The result of the score function is then calculated either by applying the arithmetic or geometric mean of the results of each rule. Each score rule can have one of the four possible types: Descriptor, Compare, Match, and Model.

The Descriptor rule can be used to find molecules that have a certain target value for a specific molecular descriptor. AlvaBuilder makes available nearly 100 descriptors by using the alvaDesc engine.¹⁶ With this rule, the user can limit the size or complexity of the molecules, for example, by setting a maximum value of the molecular weight (MW), the number of atoms (nAT), or the number of bonds (nBO). Similarly, molecular space exploration can be constrained to find only molecules with a certain degree of flexibility by setting a rule using the number of rotatable bonds (RBN). AlvaBuilder also allows users to define more sophisticated descriptors, for example, to control the lipophilicity by enforcing the octanol–water partition coefficient (LOGPcons)¹⁷⁻¹⁹ or to ensure the synthesizability (SAscore)²⁰ and the drug-likeness (QED)²¹ of the candidate molecules.

The Compare rule can be used to find compounds that are similar or dissimilar to a target molecule. This rule makes use of the Tanimoto distance²² of the ECFP fingerprints²³ of the target molecule and a population member.

Analogous to this rule, the user can also define Match rules which are used to reward those compounds that contain, or not, a given molecular pattern. The pattern can be set using the SMARTS format. For example, the Match rule can be used to avoid the presence of known toxic, mutagenic, or carcinogenic functional groups from the final data set (Goal → Not match), or to prefer those compounds including a specific fragment known to act as a ligand binding to a specific target molecule (Goal → Match).

Finally, the Model rule can be used to find molecules that yield specific value for an existing QSAR/QSPR model. This rule allows users to tackle the so-called inverse-QSAR (or inverse-QSPR)^24,25 problem which aims to identify chemical compounds having desirable predicted values for a given regression or classification model. For binary classification models, the rule is defined by selecting one of the two class labels. The rule will reward the molecules for which the model predicts the selected class. For regression models, the rule is defined similarly to the Descriptor rule; e.g., it is possible to boost molecules having a predicted value for a toxicity model less than or equal to a certain threshold. The models used in alvaBuilder must be contained in an alvaRunner project which can be prepared using Alvascience software suite,²⁶ in particular alvaDesc and alvaModel.

The GUI allows the user to also define specific molecular fragments, called Fixed fragments, that the designed molecules should contain. The fragments can be defined using the SMILES²⁷ format. Since each fixed fragment needs at least a defined atom to be used as a connection point to the other molecular fragments, the atom can be defined either manually using the [R] symbol or by letting alvaBuilder choose all possible connection points automatically.

Training Data Set

A key feature of the generative process of alvaBuilder is the use of the training data set as the source of the building blocks. In fact, the generated molecules will be the result of the composition of the fragments found in the training data set. Therefore, the final population of molecules will not contain, for example, ring systems that are not included in the training data set. For this reason, the user should pay attention to the selection and curation of the training data set. In fact, if the training data set contains molecules with undesired fragments or structural anomalies, these could be found in the molecules generated by alvaBuilder. To limit potential issues, alvaBuilder performs a standardization of the molecules read from the training data set. Such standardization includes: aromatization, normalization of the nitro group, removal of stereochemical information, and retaining the biggest structure in case of disconnected structures (e.g., salts).

Another aspect related to the training data set is the relation between its size and the chosen population size. As a general rule, the user should choose a training data set considerably bigger than the population. This allows alvaBuilder to find new fragments to compose candidates to be evaluated for being part of the next population of the GA. The variety of the fragments contained in the training data set is also relevant. In fact, alvaBuilder can generate good results even with a smaller training data set containing many different fragments.

Case Study

A case study (Figure 4) was defined for demonstrating the ability of alvaBuilder to identify a population of molecules of interest. In the case study, the goal was to find molecules containing a fragment present in valsartan, a medication for hypertension, heart failure, and diabetic nephropathy, but with certain physicochemical properties of the perindopril molecule. In particular, the LOGPcons was set to be in the range between 2 and 2.5 while that of valsartan is 4.15, and the TPSA(Tot), which represents the topological polar surface area of the molecule,²⁸ was set to be greater than or equal to 90 and less than or equal to 100 where the TPSA(Tot) of valsartan is 112.07. To control the synthetic accessibility of the target molecules, i.e., to increase the possibility that the molecules are synthesizable, the SAscore was set to a maximum of 4. In fact, the SAscore values are in the range between 1 and 10, and according to its model, molecules with high values should be difficult to synthesize. Instead, the opposite is true for low values. The QED was set with a minimum value of 0.7 since this descriptor assumes a value of 1 when all its properties are favorable to the drug-likeness of the molecule in exam. Finally, a QSAR model to predict the blood–brain barrier (BBB) permeability was used²⁶ to make sure that the target molecules were also able to access the central nervous system.²⁹ To limit the predictive error of the model, only those molecules that fall within its applicability domain (AD) were used. The BBB model used in this case study was a consensus model, and its AD was automatically calculated as the conjunction of the contained models’ ADs.

The training molecules used for this case study are a subset of the ChEMBL22 database, and it includes 271,914 compounds.³⁰ AlvaBuilder was run using its default GA parameters, i.e., a population size of 150 molecules for 100 iterations. Therefore, the final result was composed of a population of 150 molecules. Of these 150 molecules, 20 reached a final score of 1, and none of them had a score of less than 0.975. The final score of 1 means that all the defined rules were satisfied.

The validity, uniqueness, and novelty of the population were later verified using alvaMolecule.^26,31 In particular, all 150 molecules were syntactically valid, and the alvaMolecule checkers did not identify any issue. The alvaMolecule checkers were used to check for the presence of atoms having unusual valences and rings with erroneous aromaticity. The 150 molecules had no duplicates either; therefore, the uniqueness requirement was also satisfied. Additionally, all the molecules showed novelty since the Duplicate analysis of alvaMolecule found that none of them were contained in the training data set. Furthermore, by using alvaBuilder, it was verified that none of the 150 molecules were present in PubChem.³²

Additionally, an exploratory analysis of the final population was performed using alvaMolecule. In particular, we evaluated the scaffold diversity and the distribution of two parameters, molecular weight and number of rings (i.e., the cyclomatic number), both related to molecular complexity. Scaffold analysis showed that the final population includes 36 different scaffolds which can be considered a good variability since each generated molecule must include the user-defined fixed fragment. The evaluation of the molecular weight and the number of rings showed that the newly designed molecules have a molecular weight between 312.35 and 457.65 and a number of rings between 2 and 4, while the training molecules have a molecular weight between 189.11 and 973.64 and a number of rings between 0 and 11. These results confirm the ability of the genetic algorithms to design new molecules with limited complexity even if no specific rules have been defined to constrain molecular size or complexity. This also confirms the ability of the SAscore and QED descriptors to account for these molecular properties.

It took alvaBuilder less than 12 min to run the generative process on a laptop computer Intel i-7 with a CPU frequency of 1.8 GHz and 16 GB of memory (RAM). The RAM used by alvaBuilder was always less than 800 MB. In general, the RAM used is not affected by the size of the training data set since its molecules are sampled and not all kept in memory.

To evaluate how the training data set affects the final result, another test was performed using a second training data set consisting of 150 random samples of molecules from the original training data set. Even in this case, none of the molecules generated by alvaBuilder had a score lower than 0.975 (Figure 5).

Histogram (10 bins) of the score values for the population of molecules generated by alvaBuilder in the second test (blue) and for the related training data set of 150 molecules (orange).

Conclusions

AlvaBuilder is the Alvascience software solution for de novo molecular design. Its simple user-friendly graphic interface allows users to define desired characteristics for the molecules to be generated. Such characteristics can be defined using a set of rules which, for example, can make use of the alvaDesc engine to calculate molecular descriptors or can generate molecules having desired prediction values for specific QSAR/QSPR models contained in an alvaRunner project. The molecules generated are always syntactically valid since they are composed by combining fragments of molecules taken from a molecular data set chosen by the user. In this paper, we applied alvaBuilder to a case study showing how it can be used to find a population of molecules having a set of desirable physicochemical properties, while containing a given molecular fragment and having a specific predicted value by a QSAR model. The resulting molecules were exported from alvaBuilder and are available in the Supporting Information.

Data Availability Statement

AlvaBuilder is available for Windows, Linux, and macOS, and it can be downloaded from the Alvascience website (https://www.alvascience.com). The licensing scheme has options for academic and commercial customers and for single computers or research sites.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jcim.3c00610.

Score file containing the rule presented in the case study (score_file.xml) and designed molecules by alvaBuilder for the case study (designed_molecules.xlsx) (ZIP)

The authors declare the following competing financial interest(s): A.M. and M.B. are cofounders of Alvascience Srl, Lecco, Italy.

Supplementary Material

ci3c00610_si_001.zip^{(336.9KB, zip)}

References

Polishchuk P. G.; Madzhidov T. I.; Varnek A. Estimation of the Size of Drug-like Chemical Space Based on GDB-17 Data. J. Comput. Aided. Mol. Des. 2013, 27 (8), 675–679. 10.1007/s10822-013-9672-4. [DOI] [PubMed] [Google Scholar]
Walters W. P. Virtual Chemical Libraries. J. Med. Chem. 2019, 62 (3), 1116–1124. 10.1021/acs.jmedchem.8b01048. [DOI] [PubMed] [Google Scholar]
Gómez-Bombarelli R.; Wei J. N.; Duvenaud D.; Hernández-Lobato J. M.; Sánchez-Lengeling B.; Sheberla D.; Aguilera-Iparraguirre J.; Hirzel T. D.; Adams R. P.; Aspuru-Guzik A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4 (2), 268–276. 10.1021/acscentsci.7b00572. [DOI] [PMC free article] [PubMed] [Google Scholar]
Scannell J. W.; Blanckley A.; Boldon H.; Warrington B. Diagnosing the Decline in Pharmaceutical R&D Efficiency. Nat. Rev. Drug Discov. 2012, 11 (3), 191–200. 10.1038/nrd3681. [DOI] [PubMed] [Google Scholar]
Hall J.; Matos S.; Gold S.; Severino L. S. The Paradox of Sustainable Innovation: The ‘Eroom’ Effect (Moore’s Law Backwards). J. Clean. Prod. 2018, 172, 3487–3497. 10.1016/j.jclepro.2017.07.162. [DOI] [Google Scholar]
Meyers J.; Fabian B.; Brown N. De Novo Molecular Design and Generative Models. Drug Discov. Today 2021, 26 (11), 2707–2715. 10.1016/j.drudis.2021.05.019. [DOI] [PubMed] [Google Scholar]
Mouchlis V. D.; Afantitis A.; Serra A.; Fratello M.; Papadiamantis A. G.; Aidinis V.; Lynch I.; Greco D.; Melagraki G. Advances in De Novo Drug Design: From Conventional to Machine Learning Methods. Int. J. Mol. Sci. 2021, 22 (4), 1676. 10.3390/ijms22041676. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brown N.; Fiscato M.; Segler M. H. S.; Vaucher A. C. GuacaMol : Benchmarking Models for De Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. 10.1021/acs.jcim.8b00839. [DOI] [PubMed] [Google Scholar]
Mauri A.; Consonni V.; Todeschini R.. Molecular Descriptors. In Handbook of Computational Chemistry; Leszczyński J., Kaczmarek-Kedziera A., Puzyn T., Papadopoulos M. G., Reis H., Shukla M. K., Eds.; Springer International Publishing: Cham, 2017; pp 2065–2093. 10.1007/978-3-319-27282-5_51. [DOI] [Google Scholar]
Todeschini R.; Consonni V.. Molecular Descriptors for Chemoinformatics; Methods and Principles in Medicinal Chemistry series; Wiley, 2009; Vol. 41. 10.1002/9783527628766. [DOI] [Google Scholar]
Hansch C. Quantitative Approach to Biochemical Structure-Activity Relationships. Acc. Chem. Res. 1969, 2 (8), 232–239. 10.1021/ar50020a002. [DOI] [Google Scholar]
Goldman B.; Kearnes S.; Kramer T.; Riley P.; Walters W. P. Defining Levels of Automated Chemical Design. J. Med. Chem. 2022, 65 (10), 7073–7087. 10.1021/acs.jmedchem.2c00334. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bemis G. W.; Murcko M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39 (15), 2887–2893. 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]
Leardi R.; Boggia R.; Terrile M. Genetic Algorithms as a Strategy for Feature Selection. J. Chemom. 1992, 6 (5), 267–281. 10.1002/cem.1180060506. [DOI] [Google Scholar]
Leardi R. Genetic Algorithms in Chemometrics and Chemistry: A Review. J. Chemom. 2001, 15 (7), 559–569. 10.1002/cem.651. [DOI] [Google Scholar]
Mauri A.AlvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. In Ecotoxicological QSARs; Springer Nature, 2020; pp 801–820. 10.1007/978-1-0716-0150-1_32. [DOI] [Google Scholar]
Ghose A. K.; Viswanadhan V. N.; Wendoloski J. J. Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods. J. Phys. Chem. A 1998, 102 (21), 3762–3772. 10.1021/jp980230o. [DOI] [Google Scholar]
Moriguchi I.; Hirono S.; Liu Q.; Nakagome I.; Matsushita Y. Simple Method of Calculating Octanol/Water Partition Coefficient. Chem. Pharm. Bull. 1992, 40 (1), 127–130. 10.1248/cpb.40.127. [DOI] [Google Scholar]
Wildman S. A.; Crippen G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39 (5), 868–873. 10.1021/ci990307l. [DOI] [Google Scholar]
Ertl P.; Schuffenhauer A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 2009, 1 (1), 1–11. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bickerton G. R.; Paolini G. V.; Besnard J.; Muresan S.; Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4 (2), 90–98. 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tanimoto T. T.An Elementary Mathematical Theory of Classification and Prediction; IBM Internal Report, International Business Machines Corp., 1958.
Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
Gordeeva E. V.; Molchanova M. S.; Zefirov N. S. General Methodology and Computer Program for the Exhaustive Restoring of Chemical Structures by Molecular Connectivity Indexes. Solution of the Inverse Problem in QSAR/QSPR. Tetrahedron Comput. Methodol. 1990, 3 (6), 389–415. 10.1016/0898-5529(90)90066-H. [DOI] [Google Scholar]
Skvortsova M. I.; Baskin I. I.; Slovokhotova O. L.; Palyulin V. A.; Zefirov N. S. Inverse Problem in QSAR/QSPR Studies for the Case of Topological Indexes Characterizing Molecular Shape (Kier Indices). J. Chem. Inf. Comput. Sci. 1993, 33 (4), 630–634. 10.1021/ci00014a017. [DOI] [Google Scholar]
Mauri A.; Bertola M. Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood-Brain Barrier Permeability. Int. J. Mol. Sci. 2022, 23 (21), 12882. 10.3390/ijms232112882. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weininger D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Model. 1988, 28 (1), 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]
Ertl P.; Rohde B.; Selzer P. Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment-Based Contributions and Its Application to the Prediction of Drug Transport Properties. J. Med. Chem. 2000, 43 (20), 3714–3717. 10.1021/jm000942e. [DOI] [PubMed] [Google Scholar]
Daneman R.; Prat A. The Blood-Brain Barrier. Cold Spring Harb. Perspect. Biol. 2015, 7 (1), a020412. 10.1101/cshperspect.a020412. [DOI] [PMC free article] [PubMed] [Google Scholar]
Grisoni F.; Moret M.; Lingwood R.; Schneider G. Bidirectional Molecule Generation with Recurrent Neural Networks. J. Chem. Inf. Model. 2020, 60 (3), 1175–1183. 10.1021/acs.jcim.9b00943. [DOI] [PubMed] [Google Scholar]
AlvaMolecule (Software to View and Prepare Chemical Datasets) ,Version 2.0.4, 2023. Alvascience. https://www.alvascience.com.
Hähnke V. D.; Kim S.; Bolton E. E. PubChem Chemical Structure Standardization. J. Cheminform. 2018, 10 (1), 36. 10.1186/s13321-018-0293-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ci3c00610_si_001.zip^{(336.9KB, zip)}

Data Availability Statement

[ref1] Polishchuk P. G.; Madzhidov T. I.; Varnek A. Estimation of the Size of Drug-like Chemical Space Based on GDB-17 Data. J. Comput. Aided. Mol. Des. 2013, 27 (8), 675–679. 10.1007/s10822-013-9672-4. [DOI] [PubMed] [Google Scholar]

[ref2] Walters W. P. Virtual Chemical Libraries. J. Med. Chem. 2019, 62 (3), 1116–1124. 10.1021/acs.jmedchem.8b01048. [DOI] [PubMed] [Google Scholar]

[ref3] Gómez-Bombarelli R.; Wei J. N.; Duvenaud D.; Hernández-Lobato J. M.; Sánchez-Lengeling B.; Sheberla D.; Aguilera-Iparraguirre J.; Hirzel T. D.; Adams R. P.; Aspuru-Guzik A. Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent. Sci. 2018, 4 (2), 268–276. 10.1021/acscentsci.7b00572. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] Scannell J. W.; Blanckley A.; Boldon H.; Warrington B. Diagnosing the Decline in Pharmaceutical R&D Efficiency. Nat. Rev. Drug Discov. 2012, 11 (3), 191–200. 10.1038/nrd3681. [DOI] [PubMed] [Google Scholar]

[ref5] Hall J.; Matos S.; Gold S.; Severino L. S. The Paradox of Sustainable Innovation: The ‘Eroom’ Effect (Moore’s Law Backwards). J. Clean. Prod. 2018, 172, 3487–3497. 10.1016/j.jclepro.2017.07.162. [DOI] [Google Scholar]

[ref6] Meyers J.; Fabian B.; Brown N. De Novo Molecular Design and Generative Models. Drug Discov. Today 2021, 26 (11), 2707–2715. 10.1016/j.drudis.2021.05.019. [DOI] [PubMed] [Google Scholar]

[ref7] Mouchlis V. D.; Afantitis A.; Serra A.; Fratello M.; Papadiamantis A. G.; Aidinis V.; Lynch I.; Greco D.; Melagraki G. Advances in De Novo Drug Design: From Conventional to Machine Learning Methods. Int. J. Mol. Sci. 2021, 22 (4), 1676. 10.3390/ijms22041676. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Brown N.; Fiscato M.; Segler M. H. S.; Vaucher A. C. GuacaMol : Benchmarking Models for De Novo Molecular Design. J. Chem. Inf. Model. 2019, 59, 1096–1108. 10.1021/acs.jcim.8b00839. [DOI] [PubMed] [Google Scholar]

[ref9] Mauri A.; Consonni V.; Todeschini R.. Molecular Descriptors. In Handbook of Computational Chemistry; Leszczyński J., Kaczmarek-Kedziera A., Puzyn T., Papadopoulos M. G., Reis H., Shukla M. K., Eds.; Springer International Publishing: Cham, 2017; pp 2065–2093. 10.1007/978-3-319-27282-5_51. [DOI] [Google Scholar]

[ref10] Todeschini R.; Consonni V.. Molecular Descriptors for Chemoinformatics; Methods and Principles in Medicinal Chemistry series; Wiley, 2009; Vol. 41. 10.1002/9783527628766. [DOI] [Google Scholar]

[ref11] Hansch C. Quantitative Approach to Biochemical Structure-Activity Relationships. Acc. Chem. Res. 1969, 2 (8), 232–239. 10.1021/ar50020a002. [DOI] [Google Scholar]

[ref12] Goldman B.; Kearnes S.; Kramer T.; Riley P.; Walters W. P. Defining Levels of Automated Chemical Design. J. Med. Chem. 2022, 65 (10), 7073–7087. 10.1021/acs.jmedchem.2c00334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] Bemis G. W.; Murcko M. A. The Properties of Known Drugs. 1. Molecular Frameworks. J. Med. Chem. 1996, 39 (15), 2887–2893. 10.1021/jm9602928. [DOI] [PubMed] [Google Scholar]

[ref14] Leardi R.; Boggia R.; Terrile M. Genetic Algorithms as a Strategy for Feature Selection. J. Chemom. 1992, 6 (5), 267–281. 10.1002/cem.1180060506. [DOI] [Google Scholar]

[ref15] Leardi R. Genetic Algorithms in Chemometrics and Chemistry: A Review. J. Chemom. 2001, 15 (7), 559–569. 10.1002/cem.651. [DOI] [Google Scholar]

[ref16] Mauri A.AlvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. In Ecotoxicological QSARs; Springer Nature, 2020; pp 801–820. 10.1007/978-1-0716-0150-1_32. [DOI] [Google Scholar]

[ref17] Ghose A. K.; Viswanadhan V. N.; Wendoloski J. J. Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods. J. Phys. Chem. A 1998, 102 (21), 3762–3772. 10.1021/jp980230o. [DOI] [Google Scholar]

[ref18] Moriguchi I.; Hirono S.; Liu Q.; Nakagome I.; Matsushita Y. Simple Method of Calculating Octanol/Water Partition Coefficient. Chem. Pharm. Bull. 1992, 40 (1), 127–130. 10.1248/cpb.40.127. [DOI] [Google Scholar]

[ref19] Wildman S. A.; Crippen G. M. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999, 39 (5), 868–873. 10.1021/ci990307l. [DOI] [Google Scholar]

[ref20] Ertl P.; Schuffenhauer A. Estimation of Synthetic Accessibility Score of Drug-like Molecules Based on Molecular Complexity and Fragment Contributions. J. Cheminform. 2009, 1 (1), 1–11. 10.1186/1758-2946-1-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] Bickerton G. R.; Paolini G. V.; Besnard J.; Muresan S.; Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4 (2), 90–98. 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref22] Tanimoto T. T.An Elementary Mathematical Theory of Classification and Prediction; IBM Internal Report, International Business Machines Corp., 1958.

[ref23] Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50 (5), 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]

[ref24] Gordeeva E. V.; Molchanova M. S.; Zefirov N. S. General Methodology and Computer Program for the Exhaustive Restoring of Chemical Structures by Molecular Connectivity Indexes. Solution of the Inverse Problem in QSAR/QSPR. Tetrahedron Comput. Methodol. 1990, 3 (6), 389–415. 10.1016/0898-5529(90)90066-H. [DOI] [Google Scholar]

[ref25] Skvortsova M. I.; Baskin I. I.; Slovokhotova O. L.; Palyulin V. A.; Zefirov N. S. Inverse Problem in QSAR/QSPR Studies for the Case of Topological Indexes Characterizing Molecular Shape (Kier Indices). J. Chem. Inf. Comput. Sci. 1993, 33 (4), 630–634. 10.1021/ci00014a017. [DOI] [Google Scholar]

[ref26] Mauri A.; Bertola M. Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood-Brain Barrier Permeability. Int. J. Mol. Sci. 2022, 23 (21), 12882. 10.3390/ijms232112882. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref27] Weininger D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Model. 1988, 28 (1), 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]

[ref28] Ertl P.; Rohde B.; Selzer P. Fast Calculation of Molecular Polar Surface Area as a Sum of Fragment-Based Contributions and Its Application to the Prediction of Drug Transport Properties. J. Med. Chem. 2000, 43 (20), 3714–3717. 10.1021/jm000942e. [DOI] [PubMed] [Google Scholar]

[ref29] Daneman R.; Prat A. The Blood-Brain Barrier. Cold Spring Harb. Perspect. Biol. 2015, 7 (1), a020412. 10.1101/cshperspect.a020412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] Grisoni F.; Moret M.; Lingwood R.; Schneider G. Bidirectional Molecule Generation with Recurrent Neural Networks. J. Chem. Inf. Model. 2020, 60 (3), 1175–1183. 10.1021/acs.jcim.9b00943. [DOI] [PubMed] [Google Scholar]

[ref31] AlvaMolecule (Software to View and Prepare Chemical Datasets) ,Version 2.0.4, 2023. Alvascience. https://www.alvascience.com.

[ref32] Hähnke V. D.; Kim S.; Bolton E. E. PubChem Chemical Structure Standardization. J. Cheminform. 2018, 10 (1), 36. 10.1186/s13321-018-0293-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

AlvaBuilder: A Software for De Novo Molecular Design

Andrea Mauri

Matteo Bertola

Abstract

Introduction

Overview of AlvaBuilder Functionalities

Figure 1.

Figure 2.

Molecular Representation

Figure 3.

Genetic Algorithms

Score Function

Training Data Set

Case Study

Figure 4.

Figure 5.

Conclusions

Data Availability Statement

Supporting Information Available

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

AlvaBuilder: A Software for De Novo Molecular Design

Andrea Mauri

Matteo Bertola

Abstract

Introduction

Overview of AlvaBuilder Functionalities

Figure 1.

Figure 2.

Molecular Representation

Figure 3.

Genetic Algorithms

Score Function

Training Data Set

Case Study

Figure 4.

Figure 5.

Conclusions

Data Availability Statement

Supporting Information Available

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases