Open Source Code Contributions to Global Health: The Case of Antimalarial Drug Discovery

Gemma Turon; Edwin Tse; Xin Qiu; Matthew Todd; Miquel Duran-Frigola

doi:10.1021/acsmedchemlett.4c00131

. 2024 Aug 1;15(9):1645–1650. doi: 10.1021/acsmedchemlett.4c00131

Open Source Code Contributions to Global Health: The Case of Antimalarial Drug Discovery

Gemma Turon ^†, Edwin Tse ^‡, Xin Qiu ^‡, Matthew Todd ^‡, Miquel Duran-Frigola ^†,^*

PMCID: PMC11403727 PMID: 39291016

Abstract

graphic file with name ml4c00131_0003.jpg

The discovery of treatments for infectious diseases that affect the poorest countries has been stagnant for decades. As long as expected returns on investment remain low, pharmaceutical companies’ lack of interest in this disease area must be compensated for with collaborative efforts from the public sector. New approaches to drug discovery, inspired by the “open source” philosophy prevalent in software development, offer a platform for experts from diverse backgrounds to contribute their skills, enhancing reproducibility, progress tracking, and public discussion. Here, we present the first efforts of Ersilia, an initiative focused on attracting data scientists into contributing to global health, toward meeting the goals of Open Source Malaria, a consortium of medicinal chemists investigating antimalarial compounds using a purely open science approach. We showcase the chemical space exploration of a set of triazolopyrazine compounds with potent antiplasmodial activity and discuss how open source practices can serve as a common ground to make drug discovery more inclusive and participative.

Keywords: Open Source, Drug Discovery, Malaria, Artificial Intelligence, Machine Learning

Drug discovery is often described as a prohibitively costly, time-consuming, and risky enterprise, attainable only by large pharmaceutical companies that incur billions of dollars in investment.¹ As a result, priority is given to disease areas with attractive economic incentives, leading to the neglect of pathologies that are rare or predominantly affect the poorest countries. The WHO list of “neglected tropical diseases” includes 25 communicable conditions that predominantly affect the Global South and for which no effective treatment exists. Other diseases with a higher burden, such as malaria and tuberculosis, may have treatment options available but are also heavily under-resourced, with the investment from the global community estimated at 50% of what is actually needed to meet the WHO Elimination Goals.^2,3 The field of new antibiotic discovery has been alarmingly unproductive, with most of the existing options dating back to the 1970s or earlier, hampering preparedness against critical events such as the emergence of resistance.⁴

A necessary procedure in profit-driven pharmaceutical research is the protection of intellectual property, which hinders the immediate communication of scientific results. A long-standing question has been whether it is possible, if at all, to discover a new drug from the public sector by following the ideal practices of academia, which are inherently more collaborative and involve the immediate communication of scientific results. An analogy can be drawn with the technological sector, where a clear distinction exists between “open source” and “closed source” solutions. In the latter, the source code of the software or the engineering diagrams is protected and valued as intellectual property (IP). Software development has shown that open source technology can evolve to a level of sophistication, proficiency, and quality that matches and often surpasses its closed source counterparts. Most importantly, open source often has the freedom to address coding challenges that may not be attractive from a business perspective. The Internet is widely recognized as being built on open source software, and amidst the artificial intelligence (AI) revolution, a movement of open source advocates is contributing exceptional tools, on par with IP-protected software.⁵

In 2011, the term “open source drug discovery” was introduced, adapting the open source philosophy to community-driven efforts to find new antibiotics.⁶ The inaugural campaign, Open Source Malaria (OSM),⁷ aimed to discover new drugs with antiplasmodial activity. Subsequent initiatives targeted tuberculosis (Open Source Tuberculosis; OSTB), fungal mycetoma (Open Source Mycetoma; MycetOS⁸), and, more broadly, antibiotics (Open Source Antibiotics; OSA) to discover inhibitors, for instance, of the methicillin-resistant Staphylococcus aureus, one of the high-priority pathogens according to the WHO.⁹ A defining feature of these open source drug discovery initiatives was their use of GitHub (https://github.com), a cornerstone collaborative tool in the tech sector that hosts millions of software projects annually. GitHub provides contributors with the necessary tools for committing and version-controlling code and data, reporting issues, and facilitating discussions about progress and findings. OSM and sister initiatives primarily use GitHub repositories to disclose experimental data and the issue-tracking feature to encourage discussion on experimental results, share meeting minutes, and seek support. Notably, most contributors to these projects are not code developers or data scientists, meaning that the primary features of GitHub related to code development are not used to their full extent.

Inspired by these initiatives, in 2020 we created the Ersilia Open Source Initiative (https://ersilia.io) with the specific goal of attracting code developers and engineers into contributing to the discovery of new antibiotics. Ersilia was conceived as an in silico drug discovery endeavor, with a focus on leveraging AI and machine learning (ML) techniques to support experimentation in the Global South, notably in Sub-Saharan Africa, where many infectious diseases are endemic. Ersilia is entirely built on GitHub and consists of a growing collection of AI/ML drug discovery models developed in the public arena.¹⁰ The platform includes code contributions and continuous integration and continuous development (CI/CD) pipelines, along with all the software development terms (“commit”, “branch”, “fork”, “pull request”, etc.) that will sound familiar to any software developer. In our experience, framing drug discovery tasks as software engineering challenges does indeed facilitate the involvement of computer scientists, data engineers, and full-stack developers who might not otherwise engage in such projects due to a lack of exposure or steep learning curves. By 2024, Ersilia has become a thriving community of developers eager to contribute their skills to antimicrobial drug discovery, boasting over 5,000 code contributions from more than 100 distinct individuals. Most Ersilia developers do not have prior formal training in medicinal chemistry, biomedicine, or pharmacology, and the GitHub platform provides a smooth entry point for them to make meaningful contributions. Both Ersilia and the existing open source drug discovery initiatives utilize GitHub as a platform with essentially the same mission, albeit with very complementary approaches, making cooperation between us natural. We selected the OSM project, specifically the “Series 4” (S4) subproject, as a starting point for our collaborative effort. Here, we describe the history of this ongoing effort, highlighting how the use of open source development methods and, especially, GitHub issue tracking systems can prompt the dialogue between computer scientists and medicinal chemists.

In 2019, the OSM announced a community challenge to leverage novel AI/ML approaches to discover derivatives of an initial hit found within the Medicines for Malaria Venture (MMV) Malaria Box.¹¹ This hit was capable of inhibiting the Plasmodium falciparum parasite, presumably through the PfATP4 ion pump. It consisted of a triazolopyrazine core with North-East (NE) and North-West (NW) substituents (Figure 1A), which were further investigated. Previous research provided a training set of ∼400 compounds within this series, experimentally tested for their antiplasmodial potency (IC50), and exploratory computational analyses of their putative binding to PfATP4. Consequently, Ersilia opened an “issue” on the OSM GitHub profile to prompt discussion on potential derivatives of the initial S4 hits. “Issue #34” in the OSM’s GitHub profile contains a history of the activity and debate around this effort (Table 1A), including links and references to the relevant code. In this Note, we summarize the progress made so far and offer a perspective on how this approach to drug discovery may give opportunities to software developers and code contributors eager to advance drug discovery and global health.

Generative modeling of S4 compounds. (A) Exemplary S4 compound, highlighting the triazolopyrazine core, and the NW and NE substituents. (B) Scheme of the methodology. Sequential generative rounds are used to generate a sufficiently large chemical space that is then filtered with multiple criteria. (C) Number of molecules kept after each filtering criterion. (D) 2D visualization of the chemical space of generated S4 compounds. In gray, all generated compounds are shown. Colors correspond to subsets of compounds according to panel C. (E) In the subset of 556 compounds, pie charts indicate number of rings, position (para, meta, ortho) of heteroaryl substituent in the NE, and amide or ether NW substituents. (F) Crippen’s LogP values at each filtering stage. The dot indicates the median and the line indicates the interquartile range. (G) NW substituents with highest discriminative power before first activity and after the last activity prediction filters (steps 4 to 15). (H) Likewise, discriminative NE substituents.

Table 1. Assets Developed for the OSM Series 1 Project.

	Concept	URL	Summary
A	OSM Issue #34	https://github.com/OpenSourceMalaria/Series4_PredictiveModel/issues/34	GitHub issue where all the discussion between Ersilia, OSM and other collaborators is being held openly
B	Generative modeling, round 1	https://github.com/ersilia-os/osm-series4-candidates	Code for round 1 of the generative effort at Ersilia using REINVENT 2.0, generating 100k compounds
C	Predictive model	https://github.com/ersilia-os/osm-series4-predictive-model	Web application to facilitate the browsing of the R1 and R2 generated candidates, including predicted activities
D	Generative modeling, round 2	https://github.com/ersilia-os/osm-series4-candidates-2	Code for round 2 of the generative effort at Ersilia using REINVENT 2.0 and Virtual Libraries, generating 400k compounds
E	Selection of candidates for synthesis, round 2	https://github.com/ersilia-os/osm-series4-synthesis-round1	Development of the AI/ML models predicting the activity of the new candidates to select for synthesis
F	Generative modeling and selection for synthesis, round 3	https://github.com/ersilia-os/osm-series4-synthesis-round2	Code for round 3 of the generative effort at Ersilia using ChemSampler and filtering of compounds
G	PfATP4 AlphaFold structure	https://github.com/ersilia-os/osm-pfatp4-structure	Prediction of the PfATPase4 structure using AlphaFold v2
H	ZairaChem	https://github.com/ersilia-os/zaira-chem	AutoML pipeline
I	Ersilia Model Hub	https://github.com/ersilia-os/ersilia	Repository of AI/ML models for drug discovery; each model identifier has a GitHub repository
J	ChemSampler	https://github.com/ersilia-os/chem-sampler	Generative AI framework under development at Ersilia
K	ChemPFN	https://github.com/ersilia-os/chempfn	Adaptation of TabPFN²¹ to the chemistry space

Open in a new tab

Issue #34 began with Ersilia reporting a first round of generative chemistry results to expand the chemical space of S4 compounds. The code and data related to this initial effort are accessible from GitHub (Table 1B). Briefly, starting from the “Master List” of OSM S4 compounds, we developed a set of activity predictors using various ML methodologies, including ChemProp¹² and, more traditionally, hyperparameter-optimized random forest (RF) regressors and classifiers. These were then integrated into a reinforcement learning (RL) framework (REINVENT 2.0) for generative chemistry.¹³ In total, we generated over 100,000 S4 derivatives across six RL batches. The chemical space exploration was strictly confined to S4 compounds, retaining the triazolopyrazine core with NE and NW substituents and excluding substituents at other positions, which had largely been found to be detrimental to activity. In the first and second batches (B1, B2), we primarily focused on optimizing for predicted activity according to the RF predictors, and we also considered drug-likeness (QED),¹⁴ partition coefficient (LogP), and synthetic (SA) and retrosynthetic accessibility (RA¹⁵) as additional RL scores. We generated 1,867 compounds in B1 and 1,138 in B2, with almost half being common to both. Unfortunately, the molecular weight (MW) of the new molecules was lower (∼350 g/mol) compared to that of known active Series 4 molecules (∼450 g/mol). To address this, starting from the B0 and B1 generative models, in B3 we introduced an MW reward function and removed the other restrictions, resulting in a more diverse set of 46,349 compounds with MWs ranging from 400 to 500 g/mol. In B4, we reapplied the QED, LogP, SA, and RA restrictions to maximize the drug-likeness and synthetic feasibility of compounds, generating another 56,700 molecules. Finally, in B5 and B6, we employed ChemProp and the RF predictors, respectively, to produce an additional 8,833 and 5,352 molecules. The results from B1–B6 were compiled into a table featuring the structure of the 116,728 generated compounds (SMILES string) along with relevant information such as MW, QED, LogP, predicted activity, similarity to known (training) compounds, clustering results, etc. This table is available for download, and a TreeMap for chemical space navigation is also provided in the “Round 1” repository (Table 1B). A simplified version of the table, containing one representative from each of 1,000 clusters, was distributed in an easy-to-navigate web application (Table 1B).

In Issue #34, results from Round 1 prompted discussion with experimentalists. To facilitate rapid testing of their hypotheses on our underlying models, we deployed an “S4 Activity Predictor” web application (Table 1C). Upon public discussion and interaction with other OSM contributors, such as a private sector team (Evariste Technologies), Ersilia committed to carrying out a second round of generative modeling, this time focusing more explicitly on “exploration” modes to generate diversity with respect to the known S4 hits. In Round 2, another generative model optimized for low-data scenarios (Virtual Libraries) was introduced,¹⁶ as well as more bioactivity predictors based on the physicochemical properties of compounds. A detailed explanation of Round 2 can be found in another GitHub repository (Table 1D). Briefly, it was possible to generate 209,310 and 150,365 compounds with REINVENT 2.0 and Virtual Libraries, respectively. Results from Round 1 and Round 2 were assembled together into a list of 405,765 unique compounds. The major effort at this stage was to filter this list in consecutive steps to yield a manageable selection of highly interesting compounds since it was clear from initial feedback from the OSM community that automated assistance was necessary in this regard. A scheme of the methodology is provided in Figure 1B, and the filtering procedure can be found in Figure 1C. Details on the methodology are provided as Extended Methods. Briefly, we removed molecules with poor expected synthetic feasibility, molecules that were too similar to existing S4 compounds, molecules with poor predicted activity according to several ML models (including an orthogonal antimalarial potential predictor named MAIP¹⁷), and redundant compounds. This resulted in a much-reduced set of 556 compounds, of which 90 had strong predicted activities. These 90 compounds were made available via a web app (Table 1D), which triggered further discussion in Issue #34. Figure 1D shows that, as a result of the successive filters, the final selection of compounds was encircled within a relatively narrow region of the chemical space. The majority of generated compounds had 4–5 rings, with an abundance of p-triazoloheteroaryls (NE) and pyrazineamides (NW) (Figure 1E), and it was possible to identify fragments in NW (Figure 1F) and NE (Figure 1G) that discriminated predicted activity. To further converge in a list of molecules amenable for manual review by chemists, Round 1 and Round 2 results were screened using an antimalarial activity predictor developed with ZairaChem, Ersilia’s AutoML pipeline¹⁸ (Table 1H), selecting the best 35 hits, which were explored manually for their synthetic feasibility (Table 1E). The comments in Issue #34 related to the synthesis routes, purchasing of reagents, etc., offer a testimonial of how collaborative science can function within an open source forum offered by a platform like GitHub.

Finally, the OSM team manually reviewed and selected 8 compounds to be synthesized and tested for their antiplasmodial activity (Figure 2; as reflected in issue #34, selection criteria was driven by synthesis feasibility and availability of reagents). Of these, 4 exhibited submicromolar activity in a whole cell parasite assay, with one of them, OSM-LO-72, displaying an IC50 of 77 nM, a higher potency than the positive control (OSM-S-369; 167 nM). Another two compounds also showed promising results, with IC50 values below 2.5 μM, which was considered sufficient within the current discovery phase of OSM. Collectively, these results suggest that multiple rounds of generative modeling with relatively high throughput, followed by successive filters inspired by expert-based rules, can be an effective strategy to shortlist promising compounds in a collaborative manner.

(A) Experimental results from selected compounds in Round 2. (B) Experimental results from selected compounds in Round 3. Green indicates IC50 values considered to be highly active by the OSM team. Orange indicates acceptable activity and red indicates insufficient activity. Upon discussion between OSM and Ersilia, the NE substituent was fixed and variants were explored in the NW.

Fueled by these exciting results, the project branched out in several directions. In GitHub Issue #34, exploratory docking experiments on PfATP4 were conducted with the new S4 candidates, based on an initial PfATP4 structural model of the protein built by Ersilia shortly after the release of AlphaFold v2¹⁹ (Table 1G). This model was, in turn, useful to rationalize the effects of resistance mutations in PfATP4 in an independent study.²⁰ Additionally, Ersilia incorporated the relevant OSM predictive models into the Ersilia Model Hub¹⁰ to make them easily available to the scientific community (Table 1I, identifier: eos7yti). As for the S4 hit-to-lead optimization spearheaded by Ersilia, this continues to be an active effort where optimization is now focused on achieving better ADME profiles. For example, we observed that our selection criteria enriched for molecules with higher LogP values (Figure 1H), which we may ameliorate in future S4 derivatives. Notably, for antimalarial treatment, longer excretion half-lives are desirable, as determined in the MMV Target Product Profile (TPP), due to the ideally prolonged exposure of the parasite to the drug in the blood. Finally, a third generation of OSM compounds (Round 3) was performed, guided by the structure–activity relationship (SAR) learnings from Issue #34 and other public discussions within the OSM. For example, the NE substituent was now fixed to contain a difluoromethoxy phenyl group (Figure 2). Round 3 leveraged ChemSampler (an automated framework currently being developed by Ersilia; Table 1J) to mine the S4 chemical space along with predictions of antiplasmodial activity (Table 1H, K). Additionally, longer half-lives were preferred, as described above. Round 3 rendered 276 new candidates, which were filtered down to 19 according to bioactivity and ADME parameters (see Extended Methods and Extended Data in the Supporting Information). Of these, 9 were selected for testing based on the feasibility of synthesis at OSM facilities. Four compounds exhibited submicromolar activity in a whole cell parasite assay, and a fifth one showed moderate (below 2.5 μM) activity (Figure 2B). Particularly exciting was the identification of OSM-LO-100, which revealed that activity can be retained, even in the absence of a phenyl ring in the NW substituent.

While investigations into the S4 compounds proceed, Ersilia has continued to contribute to other related initiatives beyond the OSM. For example, in collaboration with a structural biology group at IRB Barcelona, we have suggested compounds with potential enzymatic activity against the MurD ligase of S. aureus (OSA). Similarly, we developed and deployed in the Ersilia Model Hub (Table 1H, identifier: eos4f95) a baseline ML model focused on Series 1 compounds for the MycetOS project. Importantly, this model was developed during an in-person “AI/ML for drug discovery” workshop at the University of Buea, Cameroon, highlighting the fact that open source drug discovery can indeed offer a participatory framework, inclusive to under-funded settings. Indeed, the OSM S4 work presented herein was conducted using conventional computers and freely available tools, which we find essential to fostering a truly collaborative environment, especially in the field of antibiotics research where funding is scarce.

The OSM S4 contribution in Issue #34 served as the first stepping stone for Ersilia to devise an open source framework focused on AI/ML applied to drug discovery. We strive to identify purposeful and engaging challenges for software engineers to contribute to, framing these challenges in computer science terminology and within an acquainted platform such as GitHub. In this sense, the existence of the OSM and related projects, which gather experimentalists in GitHub, offers an opportunity to strengthen the interplay between wet-lab scientists and software developers. While it may be argued that open science practices have long involved code contributions in public repositories to accompany scientific publications (often via GitHub), those have traditionally been delivered as static code depositions, coming from individual groups where participants, including the data scientists, are pursuing a career in health sciences. Our goal is to develop a broader framework where any tech and open source enthusiast can effectively contribute to drug discovery, similar to what we have witnessed in recent years in linguistics, where large language models are developed by domain-agnostic engineers in conjunction with experts. Likewise, themes such as intellectual property and licensing, authorship, and good practices for reproducibility and transparency have been addressed in other fields and could be translated into drug discovery.

As biomedicine and pharmacology increasingly incorporate data science, clear indications on how code developers can make effective contributions will be key to attract them to solving pressing needs in global health. We are convinced that this space can be filled using the standard open source development tools, including GitHub for code and issue tracking as demonstrated here, but also discussion platforms such as Slack and Discord, and project management pipelines popular among developers such as Agile methodologies. At Ersilia, we have adopted some of these tools and successfully attracted both recurring and sporadic developers willing to offer their skills toward advancing global health.

Acknowledgments

This work has been supported by the Rosetrees Seedcorn Award (Seedcorn2021/100263). We thank Joan Garriga and the CBLab at CEAB (Blanes, Spain) for contributing to the chemical space visualisation of our S4 compounds. G.T. is thankful to the Software Sustainability Institute Fellowship for the community discussions around research software engineering. The authors are grateful to the Ersilia and OSM communities for their enthusiasm and contributions.

Glossary

Abbreviations

ADME: absorption, distribution, metabolism, and excretion
AI: artificial intelligence
AutoML: automated machine learning
CI/CD: continuous integration and continuous development
IC50: half-maximal inhibitory concentration
IP: intellectual property
LogP: log of the water/octanol partition coefficient
MMV: Medicines for Malaria Venture
MycetOS: Open Source Mycetoma
ML: machine learning
MurD: UDP-N-acetylmuramoylalanine d-glutamate ligase
MW: molecular weight
NE: North-East
NW: North-West
OSA: Open Source Antibiotics
OSM: Open Source Malaria
OSTB: Open Source Tuberculosis
PfATP4: P-type sodium-transporting ATPase 4
QED: quantitative estimation of drug-likeness
RA: retrosynthetic accessibility
RF: random forest
RL: reinforcement learning
SA: synthetic accessibility
SAR: structure–activity relationship
SMILES: simplified molecular-input line-entry system
S4: series 4
TPP: target product profile
WHO: World Health Organization

Data Availability Statement

All data for the Open Source Malaria project is available under the Open Source Malaria GitHub repository (https://github.com/OpenSourceMalaria).

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsmedchemlett.4c00131.

Extended methods corresponding to data curation procedures, computational methods for compound generation, and candidate compound filtering, along with a supplementary table; a safety statement is also included (PDF)

Accession Codes

All the code developed in this study is available in the cited repositories (Table 1) under a GNU v3 open source license.

The authors declare no competing financial interest.

Special Issue

Published as part of ACS Medicinal Chemistry Lettersvirtual special issue “Exploring the Use of AI/ML Technologies in Medicinal Chemistry and Drug Discovery”.

Supplementary Material

ml4c00131_si_001.pdf^{(93.5KB, pdf)}

References

Wouters O. J.; McKee M.; Luyten J. Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009–2018. JAMA 2020, 323 (9), 844–853. 10.1001/jama.2020.1166. [DOI] [PMC free article] [PubMed] [Google Scholar]
World Health Organization . Global Tuberculosis Report 2023; World Health Organization, 2023. [Google Scholar]
Venkatesan P. The 2023 WHO World Malaria Report. Lancet Microbe 2024, 5, e214. 10.1016/S2666-5247(24)00016-8. [DOI] [PubMed] [Google Scholar]
Pan American Health Organization, World Health Organization . 2021 Antibacterial Agents in Clinical and Preclinical Development: An Overview and Analysis, Sep 28, 2022. https://www.paho.org/en/documents/2021-antibacterial-agents-clinical-and-preclinical-development-overview-and-analysis
Shrestha Y. R.; von Krogh G.; Feuerriegel S. Building Open-Source AI. Nat. Comput. Sci. 2023, 3 (11), 908–911. 10.1038/s43588-023-00540-0. [DOI] [PubMed] [Google Scholar]
Todd M. H. Six Laws of Open Source Drug Discovery. ChemMedChem 2019, 14 (21), 1804–1809. 10.1002/cmdc.201900565. [DOI] [PMC free article] [PubMed] [Google Scholar]
Williamson A. E.; Ylioja P. M.; Robertson M. N.; Antonova-Koch Y.; Avery V.; Baell J. B.; Batchu H.; Batra S.; Burrows J. N.; Bhattacharyya S.; Calderon F.; Charman S. A.; Clark J.; Crespo B.; Dean M.; Debbert S. L.; Delves M.; Dennis A. S. M.; Deroose F.; Duffy S.; Fletcher S.; Giaever G.; Hallyburton I.; Gamo F.-J.; Gebbia M.; Guy R. K.; Hungerford Z.; Kirk K.; Lafuente-Monasterio M. J.; Lee A.; Meister S.; Nislow C.; Overington J. P.; Papadatos G.; Patiny L.; Pham J.; Ralph S. A.; Ruecker A.; Ryan E.; Southan C.; Srivastava K.; Swain C.; Tarnowski M. J.; Thomson P.; Turner P.; Wallace I. M.; Wells T. N. C.; White K.; White L.; Willis P.; Winzeler E. A.; Wittlin S.; Todd M. H. Open Source Drug Discovery: Highly Potent Antimalarial Compounds Derived from the Tres Cantos Arylpyrroles. ACS Cent Sci. 2016, 2 (10), 687–701. 10.1021/acscentsci.6b00086. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lim W.; Eadie K.; Konings M.; van de Sande W. MycetOS – An Open Research Model Discover New Drugs to Treat One of the Most Neglected Disease – Mycetoma. Int. J. Infect. Dis. 2020, 101, 388. 10.1016/j.ijid.2020.09.1018. [DOI] [Google Scholar]
Klug D. M.; Tse E. G.; Silva D. G.; Cao Y.; Charman S. A.; Chauhan J.; Crighton E.; Dichiara M.; Drake C.; Drewry D.; da Silva Emery F.; Ferrins L.; Graves L.; Hopkins E.; Kresina T. A. C.; Lorente-Macías Á.; Perry B.; Phipps R.; Quiroga B.; Quotadamo A.; Sabatino G. N.; Sama A.; Schätzlein A.; Simpson Q. J.; Steele J.; Shanu-Wilson J.; Sjö P.; Stapleton P.; Swain C. J.; Vaideanu A.; Xie H.; Zuercher W.; Todd M. H. Open Source Antibiotics: Simple Diarylimidazoles Are Potent against Methicillin-Resistant Staphylococcus Aureus. ACS Infect Dis 2023, 9 (12), 2423–2435. 10.1021/acsinfecdis.3c00286. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turon G.; Arora D.; Caballero Lopez C.; Duran-Frigola M.. Ersilia Model Hub: A Repository of AI/ML for Neglected Tropical Diseases, 2022. 10.5281/zenodo.7274646. [DOI]
Tse E. G.; Aithani L.; Anderson M.; Cardoso-Silva J.; Cincilla G.; Conduit G. J.; Galushka M.; Guan D.; Hallyburton I.; Irwin B. W. J.; Kirk K.; Lehane A. M.; Lindblom J. C. R.; Lui R.; Matthews S.; McCulloch J.; Motion A.; Ng H. L.; Öeren M.; Robertson M. N.; Spadavecchio V.; Tatsis V. A.; van Hoorn W. P.; Wade A. D.; Whitehead T. M.; Willis P.; Todd M. H. An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials. J. Med. Chem. 2021, 64 (22), 16450–16463. 10.1021/acs.jmedchem.1c00313. [DOI] [PubMed] [Google Scholar]
Stokes J. M.; Yang K.; Swanson K.; Jin W.; Cubillos-Ruiz A.; Donghia N. M.; MacNair C. R.; French S.; Carfrae L. A.; Bloom-Ackermann Z.; Tran V. M.; Chiappino-Pepe A.; Badran A. H.; Andrews I. W.; Chory E. J.; Church G. M.; Brown E. D.; Jaakkola T. S.; Barzilay R.; Collins J. J. A Deep Learning Approach to Antibiotic Discovery. Cell 2020, 181 (2), 475–483. 10.1016/j.cell.2020.04.001. [DOI] [PubMed] [Google Scholar]
Blaschke T.; Arús-Pous J.; Chen H.; Margreitter C.; Tyrchan C.; Engkvist O.; Papadopoulos K.; Patronov A. REINVENT 2.0: An AI Tool for DE Novo Drug Design. J. Chem. Inf. Model. 2020, 60 (12), 5918–5922. 10.1021/acs.jcim.0c00915. [DOI] [PubMed] [Google Scholar]
Bickerton G. R.; Paolini G. V.; Besnard J.; Muresan S.; Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4 (2), 90–98. 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]
Thakkar A.; Chadimová V.; Bjerrum E. J.; Engkvist O.; Reymond J.-L. Retrosynthetic Accessibility Score (RAscore) – Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021, 12 (9), 3339–3349. 10.1039/D0SC05401A. [DOI] [PMC free article] [PubMed] [Google Scholar]
Moret M.; Friedrich L.; Grisoni F.; Merk D.; Schneider G. Generative Molecular Design in Low Data Regimes. Nature Machine Intelligence 2020, 2 (3), 171–180. 10.1038/s42256-020-0160-y. [DOI] [Google Scholar]
Bosc N.; Felix E.; Arcila R.; Mendez D.; Saunders M. R.; Green D. V. S.; Ochoada J.; Shelat A. A.; Martin E. J.; Iyer P.; Engkvist O.; Verras A.; Duffy J.; Burrows J.; Gardner J. M. F.; Leach A. R. MAIP: A Web Service for Predicting Blood-Stage Malaria Inhibitors. J. Cheminform. 2021, 13 (1), 13. 10.1186/s13321-021-00487-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Turon G.; Hlozek J.; Woodland J. G.; Kumar A.; Chibale K.; Duran-Frigola M. First Fully-Automated AI/ML Virtual Screening Cascade Implemented at a Drug Discovery Centre in Africa. Nature 2023, 14, 5736. 10.1038/s41467-023-41512-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiu D.; Pei J. V.; Rosling J. E. O.; Thathy V.; Li D.; Xue Y.; Tanner J. D.; Penington J. S.; Aw Y. T. V.; Aw J. Y. H.; Xu G.; Tripathi A. K.; Gnadig N. F.; Yeo T.; Fairhurst K. J.; Stokes B. H.; Murithi J. M.; Kümpornsin K.; Hasemer H.; Dennis A. S. M.; Ridgway M. C.; Schmitt E. K.; Straimer J.; Papenfuss A. T.; Lee M. C. S.; Corry B.; Sinnis P.; Fidock D. A.; van Dooren G. G.; Kirk K.; Lehane A. M. A G358S Mutation in the Plasmodium Falciparum Na+ Pump PfATP4 Confers Clinically-Relevant Resistance to Cipargamin. Nat. Commun. 2022, 13 (1), 5746. 10.1038/s41467-022-33403-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hollmann N.; Müller S.; Eggensperger K.; Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. arXiv Preprint 2022, arXiv.2207.01848. 10.48550/arXiv.2207.01848. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ml4c00131_si_001.pdf^{(93.5KB, pdf)}

Data Availability Statement

All data for the Open Source Malaria project is available under the Open Source Malaria GitHub repository (https://github.com/OpenSourceMalaria).

[ref1] Wouters O. J.; McKee M.; Luyten J. Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009–2018. JAMA 2020, 323 (9), 844–853. 10.1001/jama.2020.1166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] World Health Organization . Global Tuberculosis Report 2023; World Health Organization, 2023. [Google Scholar]

[ref3] Venkatesan P. The 2023 WHO World Malaria Report. Lancet Microbe 2024, 5, e214. 10.1016/S2666-5247(24)00016-8. [DOI] [PubMed] [Google Scholar]

[ref4] Pan American Health Organization, World Health Organization . 2021 Antibacterial Agents in Clinical and Preclinical Development: An Overview and Analysis, Sep 28, 2022. https://www.paho.org/en/documents/2021-antibacterial-agents-clinical-and-preclinical-development-overview-and-analysis

[ref5] Shrestha Y. R.; von Krogh G.; Feuerriegel S. Building Open-Source AI. Nat. Comput. Sci. 2023, 3 (11), 908–911. 10.1038/s43588-023-00540-0. [DOI] [PubMed] [Google Scholar]

[ref6] Todd M. H. Six Laws of Open Source Drug Discovery. ChemMedChem 2019, 14 (21), 1804–1809. 10.1002/cmdc.201900565. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref7] Williamson A. E.; Ylioja P. M.; Robertson M. N.; Antonova-Koch Y.; Avery V.; Baell J. B.; Batchu H.; Batra S.; Burrows J. N.; Bhattacharyya S.; Calderon F.; Charman S. A.; Clark J.; Crespo B.; Dean M.; Debbert S. L.; Delves M.; Dennis A. S. M.; Deroose F.; Duffy S.; Fletcher S.; Giaever G.; Hallyburton I.; Gamo F.-J.; Gebbia M.; Guy R. K.; Hungerford Z.; Kirk K.; Lafuente-Monasterio M. J.; Lee A.; Meister S.; Nislow C.; Overington J. P.; Papadatos G.; Patiny L.; Pham J.; Ralph S. A.; Ruecker A.; Ryan E.; Southan C.; Srivastava K.; Swain C.; Tarnowski M. J.; Thomson P.; Turner P.; Wallace I. M.; Wells T. N. C.; White K.; White L.; Willis P.; Winzeler E. A.; Wittlin S.; Todd M. H. Open Source Drug Discovery: Highly Potent Antimalarial Compounds Derived from the Tres Cantos Arylpyrroles. ACS Cent Sci. 2016, 2 (10), 687–701. 10.1021/acscentsci.6b00086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] Lim W.; Eadie K.; Konings M.; van de Sande W. MycetOS – An Open Research Model Discover New Drugs to Treat One of the Most Neglected Disease – Mycetoma. Int. J. Infect. Dis. 2020, 101, 388. 10.1016/j.ijid.2020.09.1018. [DOI] [Google Scholar]

[ref9] Klug D. M.; Tse E. G.; Silva D. G.; Cao Y.; Charman S. A.; Chauhan J.; Crighton E.; Dichiara M.; Drake C.; Drewry D.; da Silva Emery F.; Ferrins L.; Graves L.; Hopkins E.; Kresina T. A. C.; Lorente-Macías Á.; Perry B.; Phipps R.; Quiroga B.; Quotadamo A.; Sabatino G. N.; Sama A.; Schätzlein A.; Simpson Q. J.; Steele J.; Shanu-Wilson J.; Sjö P.; Stapleton P.; Swain C. J.; Vaideanu A.; Xie H.; Zuercher W.; Todd M. H. Open Source Antibiotics: Simple Diarylimidazoles Are Potent against Methicillin-Resistant Staphylococcus Aureus. ACS Infect Dis 2023, 9 (12), 2423–2435. 10.1021/acsinfecdis.3c00286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] Turon G.; Arora D.; Caballero Lopez C.; Duran-Frigola M.. Ersilia Model Hub: A Repository of AI/ML for Neglected Tropical Diseases, 2022. 10.5281/zenodo.7274646. [DOI]

[ref11] Tse E. G.; Aithani L.; Anderson M.; Cardoso-Silva J.; Cincilla G.; Conduit G. J.; Galushka M.; Guan D.; Hallyburton I.; Irwin B. W. J.; Kirk K.; Lehane A. M.; Lindblom J. C. R.; Lui R.; Matthews S.; McCulloch J.; Motion A.; Ng H. L.; Öeren M.; Robertson M. N.; Spadavecchio V.; Tatsis V. A.; van Hoorn W. P.; Wade A. D.; Whitehead T. M.; Willis P.; Todd M. H. An Open Drug Discovery Competition: Experimental Validation of Predictive Models in a Series of Novel Antimalarials. J. Med. Chem. 2021, 64 (22), 16450–16463. 10.1021/acs.jmedchem.1c00313. [DOI] [PubMed] [Google Scholar]

[ref12] Stokes J. M.; Yang K.; Swanson K.; Jin W.; Cubillos-Ruiz A.; Donghia N. M.; MacNair C. R.; French S.; Carfrae L. A.; Bloom-Ackermann Z.; Tran V. M.; Chiappino-Pepe A.; Badran A. H.; Andrews I. W.; Chory E. J.; Church G. M.; Brown E. D.; Jaakkola T. S.; Barzilay R.; Collins J. J. A Deep Learning Approach to Antibiotic Discovery. Cell 2020, 181 (2), 475–483. 10.1016/j.cell.2020.04.001. [DOI] [PubMed] [Google Scholar]

[ref13] Blaschke T.; Arús-Pous J.; Chen H.; Margreitter C.; Tyrchan C.; Engkvist O.; Papadopoulos K.; Patronov A. REINVENT 2.0: An AI Tool for DE Novo Drug Design. J. Chem. Inf. Model. 2020, 60 (12), 5918–5922. 10.1021/acs.jcim.0c00915. [DOI] [PubMed] [Google Scholar]

[ref14] Bickerton G. R.; Paolini G. V.; Besnard J.; Muresan S.; Hopkins A. L. Quantifying the Chemical Beauty of Drugs. Nat. Chem. 2012, 4 (2), 90–98. 10.1038/nchem.1243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] Thakkar A.; Chadimová V.; Bjerrum E. J.; Engkvist O.; Reymond J.-L. Retrosynthetic Accessibility Score (RAscore) – Rapid Machine Learned Synthesizability Classification from AI Driven Retrosynthetic Planning. Chem. Sci. 2021, 12 (9), 3339–3349. 10.1039/D0SC05401A. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] Moret M.; Friedrich L.; Grisoni F.; Merk D.; Schneider G. Generative Molecular Design in Low Data Regimes. Nature Machine Intelligence 2020, 2 (3), 171–180. 10.1038/s42256-020-0160-y. [DOI] [Google Scholar]

[ref17] Bosc N.; Felix E.; Arcila R.; Mendez D.; Saunders M. R.; Green D. V. S.; Ochoada J.; Shelat A. A.; Martin E. J.; Iyer P.; Engkvist O.; Verras A.; Duffy J.; Burrows J.; Gardner J. M. F.; Leach A. R. MAIP: A Web Service for Predicting Blood-Stage Malaria Inhibitors. J. Cheminform. 2021, 13 (1), 13. 10.1186/s13321-021-00487-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref18] Turon G.; Hlozek J.; Woodland J. G.; Kumar A.; Chibale K.; Duran-Frigola M. First Fully-Automated AI/ML Virtual Screening Cascade Implemented at a Drug Discovery Centre in Africa. Nature 2023, 14, 5736. 10.1038/s41467-023-41512-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] Jumper J.; Evans R.; Pritzel A.; Green T.; Figurnov M.; Ronneberger O.; Tunyasuvunakool K.; Bates R.; Žídek A.; Potapenko A.; Bridgland A.; Meyer C.; Kohl S. A. A.; Ballard A. J.; Cowie A.; Romera-Paredes B.; Nikolov S.; Jain R.; Adler J.; Back T.; Petersen S.; Reiman D.; Clancy E.; Zielinski M.; Steinegger M.; Pacholska M.; Berghammer T.; Bodenstein S.; Silver D.; Vinyals O.; Senior A. W.; Kavukcuoglu K.; Kohli P.; Hassabis D. Highly Accurate Protein Structure Prediction with AlphaFold. Nature 2021, 596 (7873), 583–589. 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref20] Qiu D.; Pei J. V.; Rosling J. E. O.; Thathy V.; Li D.; Xue Y.; Tanner J. D.; Penington J. S.; Aw Y. T. V.; Aw J. Y. H.; Xu G.; Tripathi A. K.; Gnadig N. F.; Yeo T.; Fairhurst K. J.; Stokes B. H.; Murithi J. M.; Kümpornsin K.; Hasemer H.; Dennis A. S. M.; Ridgway M. C.; Schmitt E. K.; Straimer J.; Papenfuss A. T.; Lee M. C. S.; Corry B.; Sinnis P.; Fidock D. A.; van Dooren G. G.; Kirk K.; Lehane A. M. A G358S Mutation in the Plasmodium Falciparum Na+ Pump PfATP4 Confers Clinically-Relevant Resistance to Cipargamin. Nat. Commun. 2022, 13 (1), 5746. 10.1038/s41467-022-33403-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] Hollmann N.; Müller S.; Eggensperger K.; Hutter F. TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second. arXiv Preprint 2022, arXiv.2207.01848. 10.48550/arXiv.2207.01848. [DOI] [Google Scholar]

PERMALINK

Open Source Code Contributions to Global Health: The Case of Antimalarial Drug Discovery

Gemma Turon

Edwin Tse

Xin Qiu

Matthew Todd

Miquel Duran-Frigola

Abstract

Figure 1.

Table 1. Assets Developed for the OSM Series 1 Project.

Figure 2.

Acknowledgments

Glossary

Abbreviations

Data Availability Statement

Supporting Information Available

Accession Codes

Special Issue

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Open Source Code Contributions to Global Health: The Case of Antimalarial Drug Discovery

Gemma Turon

Edwin Tse

Xin Qiu

Matthew Todd

Miquel Duran-Frigola

Abstract

Figure 1.

Table 1. Assets Developed for the OSM Series 1 Project.

Figure 2.

Acknowledgments

Glossary

Abbreviations

Data Availability Statement

Supporting Information Available

Accession Codes

Special Issue

Supplementary Material

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases