Abstract
Rare diseases impact hundreds of millions of individuals worldwide. However, few therapies exist to treat the rare disease population because financial resources are limited, the number of patients affected is low, bioactivity data is often nonexistent, and very few animal models exist to support preclinical development efforts. Sialidosis is an ultrarare lysosomal storage disorder in which mutations in the NEU1 gene result in the deficiency of the lysosomal enzyme sialidase-1. This enzyme catalyzes the removal of sialic acid moieties from glycoproteins and glycolipids. Therefore, the defective or deficient protein leads to the buildup of sialylated glycoproteins as well as several characteristic symptoms of sialidosis including visual impairment, ataxia, hepatomegaly, dysostosis multiplex, and developmental delay. In this study, we used a bibliometric tool to generate links between lysosomal storage disease (LSD) targets and existing bioactivity data that could be curated in order to build machine learning models and screen compounds in silico. We focused on sialidase as an example, and we used the data curated from the literature to build a Bayesian model which was then used to score compound libraries and rank these molecules for in vitro testing. Two compounds were identified from in vitro testing using microscale thermophoresis, namely sulfameter (Kd 2.15 ± 1.02 μM) and mexenone (Kd 8.88 ± 4.02 μM), which validated our approach to identifying new molecules binding to this protein, which could represent possible drug candidates that can be evaluated further as potential chaperones for this ultrarare lysosomal disease for which there is currently no treatment. Combining bibliometric and machine learning approaches has the ability to assist in curating small molecule data and model building, respectively, for rare disease drug discovery. This approach also has the capability to identify new compounds that are potential drug candidates.
Introduction
Rare disease research has unique challenges to overcome which are often not seen in the common disease space. While collectively rare diseases impact anywhere from 263 to 446 million individuals worldwide,1 they are at a disadvantage due to their heterogeneous nature that often comes with an underlying genetic mutation, limited financial resources to address every disease individually, disorganized and inaccessible bioactivity data, sparse or nonexistent animal models, low patient numbers, and the choice of an optimal therapeutic modality. This translational gap is a challenge, making proactive steps to develop a therapeutic strategy difficult.1
Lysosomal storage diseases (LSDs) are a group of progressive disorders that have a deficiency in one or more of the lysosomal enzymes leading to an abnormal amount of macromolecule storage build-up within the cells.2 They are known to affect multiple organs and body systems including the central nervous system, cardiovascular system, integumentary system, and skeleton.2,3 While LSDs collectively affect 1 in 5000 live births, they are individually rare and typically present in infancy or childhood.4
Approximately 16 LSDs are currently treated with disease-specific therapies, including enzyme replacement therapy (ERT), gene therapy, substrate reduction therapy, and chaperone therapy.4 The remaining LSDs do not have any disease-specific therapy, and therefore, they are often treated symptomatically or by means of supportive care only. Hematopoietic stem cell transplantation was the first approach used in the treatment of a lysosomal storage disorder, and despite the high morbidity and mortality rates, bone marrow transplantation is still used today, often in infants to prevent the development of neurological symptoms.4−6 Additionally, the Food and Drug Administration (FDA) has approved several biologics including both ERTs and gene therapy to treat this population in the past 15 years. ERT is based on the concept of receptor-mediated endocytosis through the mannose-6-phosphate receptor found on the surface of most cells.4,5,7 The recombinant lysosomal enzyme containing the correct carbohydrate group can be taken up by the cells, incorporated into the lysosome, and be directly used to treat the lysosomal enzyme deficiency for a particular LSD.4,7 On the other hand, gene therapy corrects the underlying condition by replacing the mutated gene with a healthy copy of the gene. In LSDs, gene therapy has either been delivered systemically or directly to the central nervous system by way of a viral vector.8,9 Small molecule therapies, such as substrate reduction therapy and chaperone therapy, have not been at the forefront of research. However, small molecules have several advantages including the potential for multiple routes of administration, controlled dosing, stability, scale of synthesis, and relatively low manufacturing costs when compared to biologics.10 They also have the potential to target all tissues, including the brain (if they cross the blood-brain barrier), which is especially important for this patient population.10 In the instance where one therapy is not sufficient to treat the disease, utilizing small molecules to complement ERT and gene therapy may be important to treat this patient population.
One disease of interest to us is the ultrarare LSD known as sialidosis. This disease is caused by genetic mutations in the NEU1 gene which encodes the lysosomal sialidase-1 (also known as the neuraminidase-1 enzyme11). This enzyme catalyzes the removal of sialic acid moieties from glycoproteins and glycolipids.11 Thus, defective or deficient activity of this protein leads to the buildup of sialylated glycoproteins in the lysosome causing excessive and progressive accumulation of nondigested or only partially digested material.11 Patients with sialidosis then present with either an attenuated or severe phenotype. Sialidosis type I (attenuated) is also referred to as “cherry red spot—myoclonus syndrome” because gradually reduced visual acuity and twitching of the muscles are the most discernible symptoms with the onset of the disease.11,12 These patients may present with mild intellectual disability, epilepsy, visual impairment, and ataxia which progresses and becomes debilitating throughout the course of their life.11,12 In contrast, sialidosis type II (severe) can present in utero, in infancy, or in early childhood.11,12 This severe phenotype is characterized by hepatomegaly, dysostosis multiplex, and developmental delay.11,12 In general, the severity of the phenotype correlates with the type and combination of NEU1 mutations and the levels of residual enzyme activity.11 More than 40 NEU1 disease-causing mutations have been identified in type I and type II sialidosis patients, and these are mostly missense leading to single amino acid substitutions.11,13
Over the past several years, a few treatment modalities have been tested in sialidosis mouse models. ERT was attempted in a Neu1–/– mouse model using recombinant NEU1 enzyme purified from insect cells.11,14 Although there were increased levels of NEU1 enzyme activity and pathology correction, the recombinant enzyme was immunogenic, eliciting a severe immune response.11,15 Additionally, broad desialylation using 2-deoxy-2,3-didehydro-N-acetylneuraminic acid derivatives with Ki values ranging from 53 nM to 1.3 μM have been studied along with pharmacological chaperones.16 Chaperones have been successfully tested in Neu1–/– and NEU1V54M mice and a self-complementary adeno-associated viral vector has also been successfully injected systemically in the NEU1–/–; NEU1V54M mice.11,17 However, none of these have made it to the clinic as treatments.
The expanding use of machine learning technologies is changing the way new molecules and repurposed drugs are being discovered18 because it allows us to leverage the knowledge that is already available in the public domain. In this study, we now demonstrate how utilizing a bibliometric tool to generate a link between disease targets and accessible bioactivity data has guided our efforts in extracting relevant structure-activity relationship data from the ChEMBL database for our target of interest. This is followed by applying Assay Central software19,26−34 to build machine learning models and screen compounds in silico prior to in vitro testing. This combined approach has now been applied in order to identify new compounds that can act as a potential chaperone or disease modulator in the treatment of sialidosis by targeting sialidase-1.
Results and Discussion
The LSD drug discovery (LSDDD) tool created for this project identified targets using medical subject heading (MeSH) terms that were represented in the literature for 51 LSDs. The benefit of this tool is that it provides a better opportunity to identify connections between diseases and protein targets or other proteins that might be associated with the disease (Figure 1).
Figure 1.
Overview of the LSDDD pipeline tool.
It also provides a better visual understanding of the breadth of targets related to these diseases. Additionally, the tool provides an easier mechanism when searching through the literature rather than using the PubMed website. Instead, all literature in PubMed is organized and connected to disease and target of interest in one comprehensive spreadsheet.
From the LSDDD tool, we were able to extract publicly accessible bioactivity data from ChEMBL19,20 for 11 targets as presented in Table S1. Ten targets had publicly available inhibitory data in the form of IC50 values, while five targets had publicly available binding affinity data in the form of Ki values. From the data presented, we have been able to build 15 machine learning models thus far, while still taking into account the size of the bioactivity dataset. All models built had greater than or equal to 50 compounds in the dataset to increase the chemical diversity represented in the model.
As a representative example, the sialidase-1 model was built with IC50 structure activity data from 57 compounds curated from ChEMBL 25. The five-fold cross validation receiver operating characteristic (ROC) was 0.737 with a precision score of 0.345, recall score of 0.91, specificity score of 0.587, F1 score of 0.50, kappa score of 0.306, and Matthews correlation coefficient (MCC) score of 0.392 (Figure 2). The threshold for activity was automatically calculated to be 100 μM to provide a reasonable number of active compounds that would be used to predict the activity of further compound screening libraries.
Figure 2.

ROC curve for the Assay Central for sialidase.
The compounds selected from the machine learning model in Assay Central for in vitro testing (Table 1) came from our raw prediction script which outputs the top-scoring molecules with consideration for applicability but not diversity. The raw prediction script was used because it was important to identify compounds that bind to the target sialidase-1 at this time. This model was used to score the FDA-approved drug library, SuperDRUG2 (a compound library that contains 4237 drugs worldwide), the MedChem Express autophagy library, and our Collaborations Pharmaceuticals Inc. private library of several hundred available molecules. The top-scoring compounds that came out of our automated process are presented in Table S2. Table S2 provides cytotoxicity scores for all top-scoring compounds as well as scores that reflect the compounds’ ability to reach the brain using the CNS MPO score.21
Table 1. Table of Structures, Compounds, and Data Used to Validate the Machine Learning Model.
Three compounds from the SuperDRUG2 library were selected for validation in vitro using microscale thermophoresis (Table 1) including sulfameter with a predicted probability-like-score of 1.01, mexenone with a probability-like-score of 0.91, and anethole trithione with a probability-like-score of 0.97. Additional criteria guided the final selection of the compounds selected for testing in vitro, including active prediction scores against our blood−brain barrier model and inactive prediction scores against our cytotoxicity model.
Two compounds showed binding, sulfameter with a Kd of 2.15 μM (Kd confidence ± 1.02 μM) and mexenone with a Kd of 8.88 μM (Kd confidence ± 4.02 μM) (Figure 3).
Figure 3.

In vitro data was generated using Microscale Thermophoresis for sulfameter and mexenone. The binding analysis for the interaction between sialidase-1 and these compounds is presented. The concentration of labeled sialidase-1 is kept constant at 5 nM, while the ligand concentration varies from 250 μM and 7.629 nM. The serial titrations result in measurable changes in the fluorescence signal within a temperature gradient that can be used to calculate the dissociation constant Kd = 2.15 (Kd confidence ± 1.02 μM) for sulfameter and Kd = 8.88 μM (Kd confidence ± 4.02 μM) for mexenone. The curve is shown as Fraction Bound [-] against compound concentration on a log scale.
Machine learning approaches are a promising approach for pharmaceutical companies to model and predict from either their data or that from the public domain. In the space of rare diseases, we have proposed how machine learning could impact productivity and identify new small molecules that could be tested relatively efficiently.18 We have also previously described our Assay Central software which uses a Bayesian method and described applications such as for drug-induced liver injury,22 estrogen receptor,23Mycobacterium tuberculosis,24 non-nucleoside reverse transcriptase, and whole cell HIV.25
We employed the bibliometric tool LSDDD to find data for 51 LSDs, and we were able to identify 11 targets with bioactivity data available on ChEMBL to build machine learning models (Table S1). We have focused on sialidosis as an example of a LSD to validate the predictions in vitro. Because of the limited publicly available data, only a small training set was able to be curated to generate the IC50 model. The limited chemical space represented in the model puts a constraint on the output of active predictions from various screening libraries. Admittedly this is also the case for many rare disease targets because they are understudied, the data collected from these targets has not been published, or the data simply does not exist. Despite the small training set used to predict active small molecules for human sialidase-1, numerous compounds were predicted with a high activity from several screening libraries using the raw prediction script. In the course of this study, we identified two compounds, including sulfameter and mexenone, that bind to sialidase-1. Sulfameter is an old antibacterial drug that has been used in the treatment of leprosy26 and urinary tract infections,27 while mexenone is an active sun screen agent.28 These compounds have not been previously reported in the literature to bind to sialidase-1, nor are they represented in the machine learning model training set. Furthermore, the identification of these two structurally different compounds using machine learning demonstrates the capability of Assay Central to accurately predict the binding of small molecules based on chemical structure alone.
The follow-up steps to this study would be to continue testing hits using microscale thermophoresis that came out of our in silico screen (Table S1). Then, we can introduce the tested compounds (positive and negative) back into the machine learning model, potentially improve the model, and concurrently identify new compounds worthy of characterizing further. Additionally, future studies need to look at whether sulfameter and mexenone increase enzymatic activity for sialidase-1. Enzyme activity should be measured in cell lines due to the fact that sialidase-1 is active when it is in complex with cathepsin A and multiple binding sites exist on sialidase-1.29,30 Furthermore, various patient cell lines with missense mutations should be used to evaluate their potential as chaperones.31
In this study, we have described an accessible bibliometric approach enabling searching for data on rare diseases as highlighted by our application to LSDs. Currently, there are no comprehensive databases that contain data for all rare diseases that can be used to build machine learning models. However, others have described the development of a comprehensive global genotype-phenotype database for rare diseases.32 There are also databases such as RD-connect that focus on sharing genomics data33 as well as more general rare disease databases.34 Currently available chemistry and biology data relevant for rare disease drug discovery are extremely diffuse, existing in an array of public (many without PubMed or SciFinder curation), private, or other databases (PubChem,35 ChEMBL20). The US FDA’s Rare Disease Repurposing Database36 consists of excel tables containing approved orphan drugs. PhRMA recently collated data on >400 treatments in preclinical and clinical phases, but this is a PDF file and not a database.37 Outside of the tools described in this study, the OPEN Targets platform is one of the few other comprehensive databases that can be used to access and visualize potential drug targets affiliated with a disease. It leverages multiple data types and supports workflows starting from a target or disease and shows the available evidence for that target–disease affiliation.38
We have illustrated how we can integrate information from our own and multiple public databases (such as PubCHEM,35 ChEMBL19,20) to build models for each target–disease that enable the user to identify potential active compounds for use against them.18 We can also provide a wide array of other information that would enable scientists to build a complete picture of what is available. This effort is cost effective and could speed up the process of identifying small molecules that can potentially be used to treat rare diseases. The ability to mine such information may also be of use for patients and advocates trying to learn about such diseases and perhaps attempting to find drugs to repurpose themselves for a specific disease. Unfortunately, there is still limited bioactivity data for many rare disease targets; therefore, leveraging the current knowledge will be vital to expedite the drug development process for rare diseases as a whole.
Experimental Section
Bibliometric Analysis
The LSDDD pipeline tool allows an end user to explore the literature for a set of diseases, identify all targets affiliated with that disease of interest across species; as represented in the literature, identify compounds tested against that target of interest, and extract evidence of that chemical interaction with the target in vitro as shown in Figure 1. The implementation of this exploratory data retrieval approach rests on the literature data from Chemotext,39,40 a database of MeSH terms from PubMed, and assay data from ChEMBL 25. The target protein name is used to link the Chemotext39,40 literature data to ChEMBL 25 assay data. This exploration pipeline was implemented in Microsoft Excel, making use of Visual Basic for Applications (VBA) programming language. VBA was used to send application programming interface (API) requests to ChEMBL, retrieve the results, format them in excel, and provide end user navigation. The LSDDD pipeline tool built for this project has a more detailed user guide (Supporting Information).
LSDDD Pipeline Step 1: Disease–Target
The LSDs were identified by querying the MeSH tree from the National Library of Medicine. We selected any diseases that fell under the tree categories of LSDs and the corresponding tree nodes C16.320.565.595 and C18.452.648.595. We also identified diseases that are supplemental concepts mapped to selected MeSH concepts (Table S3). For example, sialidase-1 deficiency is a concept that was selected because it mapped to mucolipidoses as shown in the Supporting Information. The resulting set of 56 MeSH disease terms was then used to query the Chemotext39,40 database of MeSH terms extracted from PubMed citations of over 20 million PubMed entries.39 The query identified all article citations in which one of the MeSH LSD disease terms was annotated with a protein MeSH term. The proteins are identified as MeSH terms in the D12 tree branch and will be referred to as targets.
The resulting LSD and protein targets were exported to excel along with the title and publication year of the article. The output was formatted into a pivot table to create an overview of the disease–target literature space and can be found on the overview sheet of the resulting LSDDD tool. The numbers represent the number of PubMed articles in which the disease and target MeSH terms are co-annotated. Routines in VBA were written to allow the user to double-click on an article count to be taken to the detail sheet where the bibliographic information such as PubMed identifier, title, and publication year are viewable.
LSDDD Pipeline Step 2: MeSH Target–ChEMBL Target
We made the link from literature information to ChEMBL 25 data using the protein target name. From the overview sheet, the user can send a search request to ChEMBL with the MeSH target name. The ChEMBL web service API receives the MeSH term, performs a search, and returns the matching ChEMBL target count. Through a routine activated by double-clicking on a row in column E, the actual ChEMBL target records can be retrieved via the API and viewed on the target sheet. Data elements available on the target sheet are MeSH target name, target ID, target preferred name, and species.
LSDDD Pipeline Step 3: ChEMBL Target–ChEMBL Assay
The functionality built into the target sheet is aimed at finding the number of assays for selected target or targets. Selecting a cell in the blue column C and clicking on the button above will cause the tool to build and send a query to the ChEMBL web service API that retrieves and thus counts the assays for the specified target. Double-clicking on a blue cell will cause the tool to retrieve ChEMBL records through an API call, parse them, and write them to the assay sheet where they can be viewed. The data elements available through the assay sheet are target ID, assay ID, assay type, assay description, document ID, and the target preferred name.
LSDDD Pipeline Step 4: ChEMBL Assay–Chemical
The assay sheet has functionality similar to the target sheet. The results may be viewed or expanded by double-clicking in a blue cell. This action causes the tool to build and send an API call to ChEMBL to retrieve the records for chemicals tested in the assay. The LSDDD tool then parses the returned xml records and inserts the desired values into the chemical sheet. The data elements displayed are target ID, assay ID, chemical ID, SMILES, preferred (ChEMBL) chemical name, results, assay description, and the ChEMBL target name.
Machine Learning—Assay Central
The Assay Central software has been previously described.19,26−34 We utilized Assay Central to prepare and merge datasets collated in Molecular Notebook23 and generate Bayesian machine learning models using the ECFP6 descriptor41,42 from the CDK library.43 This software first employs a series of rules for the detection of problem data that is corrected by a combination of automated structure standardization (removing salts, neutralizing unbalanced charges, merging duplicate structures with finite activities) and identifying advanced problems to be resolved by human re-curation. This high-quality dataset is then subjected to a Bayesian algorithm to generate a machine learning model. Each model in Assay Central includes the following metrics to evaluate predictive performance: recall, precision, specificity, F1-score, ROC curve,23 Cohen’s kappa (CK),44,45 and the MCC.46 The Assay Central prediction workflow assigned a probability-like score41,42 and applicability domain (which assesses the portion of fragments overlapping with the training set molecules) to the input compounds according to a user-specified model. Active predictions are considered those with assigned scores >0.5.
The machine learning model built for the drug target sialidase-1 was built with publicly accessible IC50 data from ChEMBL (ChEMBL2726) version ChEMBL 25 extracted from the LSDDD tool in addition to IC50 data extracted from Binding DB. This model was used to score the FDA-approved drug library, SuperDRUG2 (a compound library that contains ∼4000 drugs worldwide), the MedChem Express autophagy library, and our Collaborations Pharmaceuticals Inc. private library. Furthermore, prediction lists were refined with in vitro testing in mind through our cytotoxicity model47 and a crude measure of CNS activity called the Pfizer CNS MPO score21,48 calculated using descriptors from ChemAxon software (Cambridge, MA).
Microscale Thermophoresis
Mexenone and sulfameter were purchased from MedChem Express (Monmouth Junction, NJ). The human recombinant NEU-1/sialidase-1 protein used in this experiment was purchased from Novus Biologics (Littleton, CO) (catalog number, NBP2-23471) at a concentration of 0.25 mg/mL.
Experiments were performed using a Monolith Pico (NanoTemper Technologies, Cambridge, MA). Sialidase-1 was labeled using the His-Tag Labeling Kit RED-tris-NTA 2nd Generation (MO-LO18) in phosphate-buffered saline supplemented with 0.05% Tween 20. Briefly, 10 μM protein was labeled using His-Tag Labeling Kit RED-tris-NTA 2nd Generation (MO-L018) in phosphate-buffered saline supplemented with 0.05% Tween 20, according to the manufacturer’s instructions, and centrifuged at 15k rcf for 10 min. Binding affinity measurements were performed using 5 nM protein a serial dilution of compounds, starting at 250 μM. For each experimental compound, 16 independent stocks were made in dimethyl sulfoxide (DMSO) using 2-fold serial dilution (10 mM initial concentration). 19.5 μL of labeled sialidase (5 nM) in beta-mercaptoethanol (MST) buffer (HEPES 10 mM pH 7.4, NaCl 150 mM, 0.1% Triton X-100 and 1 mM beta-mercaptoethanol) was combined with 0.5 μL of the compound stock and then mixed thoroughly. This resulted in 2-fold serial dilution testing series with the highest and lowest concentration of 250 μM and 7.629 nM, respectively, with a consistent final DMSO concentration of 2.5%. Protein was incubated on ice in the presence of compounds for 20 min prior to transferring to standard Monolith NT.115 capillaries. Triplicate measurements were performed with standard capillaries at medium MST power and 20% LED at 23.0 °C on a Monolith NT.115Pico (NanoTemper). The data were acquired with MO.Control 1.6.1 (NanoTemper Technologies). Recorded data were analyzed with MO.Affinity Analysis 2.3 (NanoTemper Technologies). The dissociation constant Kd quantifies the equilibrium of the reaction of the labeled molecule A (concentration cA) with its target T (concentration cT) to form the complex AT (concentration cAT): and is defined by the law of mass action as Kd = cAxcT/cAT, where all concentrations are “free” concentrations. During the titration experiments, the concentration of the labeled molecule A is kept constant and the concentration of the added target T is increased. These concentrations are known and can be used to calculate the dissociation constant. The free concentration of the labeled molecule A is the added concentration minus the concentration of formed complex AT. The Kd is calculated as Kd = (cA0 – cAT)x(cT – cAT)/cAT. The fraction of bound molecules x can be derived from Fnorm, where Fnorm(A) is the normalized fluorescence of only unbound labeled molecules A and Fnorm(AT) is the normalized fluorescence of complexes AT of the labeled as shown by the equation: x = Fnorm(cT0) – Fnorm(A)/Fnorm(AT) – Fnorm(A). The MST traces that showed aggregation or outliers were removed from the datasets prior to Kd determination. Kd confidences were obtained within the MO Affinity Analysis Software (NanoTemper), with a confidence of 68%.
Acknowledgments
We acknowledge Dr. Alexander Tropsha and Dr. Anthony Hickey for rare disease discussions and Dr. Alex M. Clark (Molecular Materials Informatics, Inc.) for Assay Central support. We thank Dr. Dinorah Leyva for helping with the analysis of MST data. We kindly acknowledge NIH funding to develop the software from NIGMS R44GM122196-02A1 as well as support from NINDS 1R43NS107079-01, NINDS 3R43NS107079-01S1. F.U. was partially supported by the NIH award number DP7OD020317.
Glossary
Abbreviations
- LSDs
lysosomal storage diseases
- ERT
enzyme replacement therapy
- AC
Assay Central
- MCC
Matthews correlation coefficient
- AUC
area under the receiver operating characteristic curve
- CK
Cohen’s kappa
- ROC
receiver operator characteristic
Supporting Information Available
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.0c05591.
The authors declare the following competing financial interest(s): S.E. is owner and J.J.K., D.H.F., K.M.Z., F.U., A.C.P. are employees of Collaborations Pharmaceuticals, Inc.
Supplementary Material
References
- Nguengang Wakap S.; Lambert D. M.; Olry A.; Rodwell C.; Gueydan C.; Lanneau V.; Murphy D.; Le Cam Y.; Rath A. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet. 2020, 28, 165–173. 10.1038/s41431-019-0508-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marques A. R. A.; Saftig P. Lysosomal storage disorders - challenges, concepts and avenues for therapy: beyond rare diseases. J. Cell Sci. 2019, 132, jcs221739. 10.1242/jcs.221739. [DOI] [PubMed] [Google Scholar]
- James R. A.; Singh-Grewal D.; Lee S.-J.; McGill J.; Adib N. Lysosomal storage disorders: A review of the musculoskeletal features. J. Paediatr. Child Health 2016, 52, 262–271. 10.1111/jpc.13122. [DOI] [PubMed] [Google Scholar]
- Platt F. M.; d’Azzo A.; Davidson B. L.; Neufeld E. F.; Tifft C. J. Lysosomal storage diseases. Nat. Rev. Dis. Primers 2018, 4, 27. 10.1038/s41572-018-0037-0. [DOI] [PubMed] [Google Scholar]
- Bruni S.; Loschi L.; Incerti C.; Gabrielli O.; Coppa G. V. Update on treatment of lysosomal storage diseases. Acta Myol. 2007, 26, 87–92. [PMC free article] [PubMed] [Google Scholar]
- Aldenhoven M.; Kurtzberg J. Cord blood is the optimal graft source for the treatment of pediatric patients with lysosomal storage diseases: clinical outcomes and future directions. Cytotherapy 2015, 17, 765–774. 10.1016/j.jcyt.2015.03.609. [DOI] [PubMed] [Google Scholar]
- Sly W. S.; Kaplan A.; Achord D. T.; Brot F. E.; Bell C. E. Receptor-mediated uptake of lysosomal enzymes. Prog. Clin. Biol. Res. 1978, 23, 547–551. [PubMed] [Google Scholar]
- Sands M. S.; Davidson B. L. Gene therapy for lysosomal storage diseases. Mol. Ther. 2006, 13, 839–849. 10.1016/j.ymthe.2006.01.006. [DOI] [PubMed] [Google Scholar]
- Sands M. S.; Haskins M. E. CNS-directed gene therapy for lysosomal storage diseases. Acta Paediatr. 2008, 97, 22–27. 10.1111/j.1651-2227.2008.00660.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tambuyzer E.; Vandendriessche B.; Austin C. P.; Brooks P. J.; Larsson K.; Needleman K. I. M.; Valentine J.; Davies K.; Groft S. C.; Preti R.; Oprea T. I.; Prunotto M. Publisher Correction: Therapies for rare diseases: therapeutic modalities, progress and challenges ahead. Nat. Rev. Drug Discovery 2020, 19, 291. 10.1038/s41573-019-0059-7. [DOI] [PubMed] [Google Scholar]
- D’Azzo A.; Machado E.; Annunziata I. Pathogenesis, Emerging therapeutic targets and Treatment in Sialidosis. Expert Opin. Orphan Drugs 2015, 3, 491–504. 10.1517/21678707.2015.1025746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franceschetti S.; Canafoglia L. Sialidoses. Epileptic Disord. 2016, 18, 89–93. 10.1684/epd.2016.0845. [DOI] [PubMed] [Google Scholar]
- Seyrantepe V.; Poupetova H.; Froissart R.; Zabot M.-T. r. s.; Maire I. n.; Pshezhetsky A. V. Molecular pathology of NEU1 gene in sialidosis. Hum. Mutat. 2003, 22, 343–352. 10.1002/humu.10268. [DOI] [PubMed] [Google Scholar]
- Bonten E. J.; Wang D.; Toy J. N.; Mann L.; Mignardot A.; Yogalingam G.; D’Azzo A. Targeting macrophages with baculovirus-produced lysosomal enzymes: implications for enzyme replacement therapy of the glycoprotein storage disorder galactosialidosis. FASEB J. 2004, 18, 971–973. 10.1096/fj.03-0941fje. [DOI] [PubMed] [Google Scholar]
- Wang D.; Bonten E. J.; Yogalingam G.; Mann L.; d’Azzo A. Short-term, high dose enzyme replacement therapy in sialidosis mice. Mol. Genet. Metab. 2005, 85, 181–189. 10.1016/j.ymgme.2005.03.007. [DOI] [PubMed] [Google Scholar]
- Guo T.; Héon-Roberts R.; Zou C.; Zheng R.; Pshezhetsky A. V.; Cairo C. W. Selective Inhibitors of Human Neuraminidase 1 (NEU1). J. Med. Chem. 2018, 61, 11261–11279. 10.1021/acs.jmedchem.8b01411. [DOI] [PubMed] [Google Scholar]
- Hu H.; Gomero E.; Bonten E.; Gray J. T.; Allay J.; Wu Y.; Wu J.; Calabrese C.; Nienhuis A.; d’Azzo A. Preclinical dose-finding study with a liver-tropic, recombinant AAV-2/8 vector in the mouse model of galactosialidosis. Mol. Ther. 2012, 20, 267–274. 10.1038/mt.2011.227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ekins S.; Puhl A. C.; Zorn K. M.; Lane T. R.; Russo D. P.; Klein J. J.; Hickey A. J.; Clark A. M. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 2019, 18, 435–441. 10.1038/s41563-019-0338-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anon . ChEMBL. https://chembl.gitbook.io/chembl-interface-documentation/downloads (accessed 2019-03-28).
- Gaulton A.; Hersey A.; Nowotka M.; Bento A. P.; Chambers J.; Mendez D.; Mutowo P.; Atkinson F.; Bellis L. J.; Cibrián-Uhalte E.; Davies M.; Dedman N.; Karlsson A.; Magariños M. P.; Overington J. P.; Papadatos G.; Smit I.; Leach A. R. The ChEMBL database in 2017. Nucleic Acids Res. 2017, 45, D945–D954. 10.1093/nar/gkw1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wager T. T.; Hou X.; Verhoest P. R.; Villalobos A. Moving beyond rules: the development of a central nervous system multiparameter optimization (CNS MPO) approach to enable alignment of druglike properties. ACS Chem. Neurosci. 2010, 1, 435–449. 10.1021/cn100008c. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Minerali E.; Foil D. H.; Zorn K. M.; Lane T. R.; Ekins S. Comparing Machine Learning Algorithms for Predicting Drug-Induced Liver Injury (DILI). Mol. Pharm. 2020, 17, 2628–2637. 10.1021/acs.molpharmaceut.0c00326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Russo D. P.; Zorn K. M.; Clark A. M.; Zhu H.; Ekins S. Comparing Multiple Machine Learning Algorithms and Metrics for Estrogen Receptor Binding Prediction. Mol. Pharm. 2018, 15, 4361–4370. 10.1021/acs.molpharmaceut.8b00546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lane T.; Russo D. P.; Zorn K. M.; Clark A. M.; Korotcov A.; Tkachenko V.; Reynolds R. C.; Perryman A. L.; Freundlich J. S.; Ekins S. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol. Pharm. 2018, 15, 4346–4360. 10.1021/acs.molpharmaceut.8b00083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zorn K. M.; Lane T. R.; Russo D. P.; Clark A. M.; Makarov V.; Ekins S. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets. Mol. Pharm. 2019, 16, 1620–1632. 10.1021/acs.molpharmaceut.8b01297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Languillon J. Treatment of leprosy with clofazimine, rifampicin and Bayrena. Lepr. Rev. 1975, 46, 81–84. 10.5935/0305-7518.19750044. [DOI] [PubMed] [Google Scholar]
- Medina A.; Brown C. D.; Erdman H.; Prigot A. Absorption Studies on a New Long-Active Sulfonamide, Sulfamethoxydiazine. Antimicrob. Agents Chemother. 1963, 161, 541–545. [PubMed] [Google Scholar]
- Macleod T. M.; Frain-Bell W. A study of chemical light screening agents. Br. J. Dermatol. 1975, 92, 417–425. 10.1111/j.1365-2133.1975.tb03103.x. [DOI] [PubMed] [Google Scholar]
- Wang D.; Zaitsev S.; Taylor G.; d’Azzo A.; Bonten E. Protective protein/cathepsin A rescues N-glycosylation defects in neuraminidase-1. Biochim. Biophys. Acta 2009, 1790, 275–282. 10.1016/j.bbagen.2009.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gullner G. Inhibition of cytochrome C oxidase by glutathione in vitro. Acta Biochim. Biophys. Hung. 1990, 25, 31–35. [PubMed] [Google Scholar]
- Khan A.; Sergi C. Sialidosis: A Review of Morphology and Molecular Biology of a Rare Pediatric Disorder. Diagnostics 2018, 8, 29. 10.3390/diagnostics8020029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Trujillano D.; Oprea G.-E.; Schmitz Y.; Bertoli-Avella A. M.; Abou Jamra R.; Rolfs A. A comprehensive global genotype-phenotype database for rare diseases. Mol. Genet. Genomic Med. 2017, 5, 66–75. 10.1002/mgg3.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson R.; Johnston L.; Taruscio D.; Monaco L.; Beroud C.; Gut I. G.; Hansson M. G.; t Hoen P. B.; Patrinos G. P.; Dawkins H.; Ensini M.; Zatloukal K.; Koubi D.; Heslop E.; Paschall J. E.; Posada M.; Robinson P. N.; Bushby K.; Lochmuller H. RD-Connect: an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. J. Gen. Intern. Med. 2014, 29, 780–787. 10.1007/s11606-014-2908-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rath A.; Olry A.; Dhombres F.; Brandt M. M.; Urbero B.; Ayme S. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 2012, 33, 803–808. 10.1002/humu.22078. [DOI] [PubMed] [Google Scholar]
- Anon . PubChem. https://pubchem.ncbi.nlm.nih.gov/ (accessed 2019-03-25).
- Xu K.; Cote T. R. Database identifies FDA-approved drugs with potential to be repurposed for treatment of orphan diseases. Briefings Bioinf. 2011, 12, 341–345. 10.1093/bib/bbr006. [DOI] [PubMed] [Google Scholar]
- Anon . Rare Diseases: A Report on Orphan Drugs in the Pipeline, 2013. http://www.phrma.org/sites/default/files/pdf/Rare_Diseases_2013.pdf.
- Carvalho-Silva D.; Pierleoni A.; Pignatelli M.; Ong C.; Fumis L.; Karamanis N.; Carmona M.; Faulconbridge A.; Hercules A.; McAuley E.; Miranda A.; Peat G.; Spitzer M.; Barrett J.; Hulcoop D. G.; Papa E.; Koscielny G.; Dunham I. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 2019, 47, D1056–D1065. 10.1093/nar/gky1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baker N. C.; Hemminger B. M. Mining connections between chemicals, proteins, and diseases extracted from Medline annotations. J. Biomed. Inf. 2010, 43, 510–519. 10.1016/j.jbi.2010.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capuzzi S. J.; Thornton T. E.; Liu K.; Baker N.; Lam W. I.; O’Banion C. P.; Muratov E. N.; Pozefsky D.; Tropsha A. Chemotext: A Publicly Available Web Server for Mining Drug-Target-Disease Relationships in PubMed. J. Chem. Inf. Model. 2018, 58, 212–218. 10.1021/acs.jcim.7b00589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark A. M.; Dole K.; Coulon-Spektor A.; McNutt A.; Grass G.; Freundlich J. S.; Reynolds R. C.; Ekins S. Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J. Chem. Inf. Model. 2015, 55, 1231–1245. 10.1021/acs.jcim.5b00143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Clark A. M.; Ekins S. Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL. J. Chem. Inf. Model. 2015, 55, 1246–1260. 10.1021/acs.jcim.5b00144. [DOI] [PubMed] [Google Scholar]
- Willighagen E. L.; Mayfield J. W.; Alvarsson J.; Berg A.; Carlsson L.; Jeliazkova N.; Kuhn S.; Pluskal T.; Rojas-Cherto M.; Spjuth O.; Torrance G.; Evelo C. T.; Guha R.; Steinbeck C. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminf. 2017, 9, 33. 10.1186/s13321-017-0231-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carletta J. Assessing agreement on classification tasks: The kappa statistic. Comput. Ling. 1996, 22, 249–254. [Google Scholar]
- Cohen J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. 10.1177/001316446002000104. [DOI] [Google Scholar]
- Matthews B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta 1975, 405, 442–451. 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]
- Perryman A. L.; Patel J. S.; Russo R.; Singleton E.; Connell N.; Ekins S.; Freundlich J. S. Naive Bayesian Models for Vero Cell Cytotoxicity. Pharm. Res. 2018, 35, 170. 10.1007/s11095-018-2439-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wager T. T.; Hou X.; Verhoest P. R.; Villalobos A. Central Nervous System Multiparameter Optimization Desirability: Application in Drug Discovery. ACS Chem. Neurosci. 2016, 7, 767–775. 10.1021/acschemneuro.6b00029. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



