Abstract
Discovering novel uses for existing drugs, through drug repurposing, can reduce the time, costs, and risk of failure associated with new drug development. However, prioritizing drug repurposing candidates for downstream studies remains challenging. Here, we present a high-throughput approach to identify and validate drug repurposing candidates. This approach integrates human gene expression, drug perturbation, and clinical data from publicly available resources. We apply this approach to find drug repurposing candidates for two diseases, hyperlipidemia and hypertension. We screen >21,000 compounds and replicate ten approved drugs. We also identify 25 (seven for hyperlipidemia, eighteen for hypertension) drugs approved for other indications with therapeutic effects on clinically relevant biomarkers. For five of these drugs, the therapeutic effects are replicated in the All of Us Research Program database. We anticipate our approach will enable researchers to integrate multiple publicly available datasets to identify high priority drug repurposing opportunities for human diseases.
Subject terms: Virtual drug screening, Gene expression, Data integration, High-throughput screening, Software
Prioritizing drug repurposing candidates for downstream studies remains challenging. Here, the authors present a high-throughput approach to identify and validate drug repurposing candidates, integrating human gene expression, drug perturbation, and clinical data from publicly available resources.
Introduction
Developing a new drug is expensive, often fails, and takes a long time. Drug repurposing aims to address these issues by finding new indications for existing drugs1. Repurposing existing drugs can decrease the cost and shorten the duration of drug development because many of the preclinical and safety studies have already been completed. Drug repurposing can also improve the success rate of drug development because existing drugs often have well-characterized safety profiles. Examples of successfully repurposed drugs include rituximab for rheumatoid arthritis2 and sildenafil for erectile dysfunction3. However, there have also been many drug repurposing candidates that have failed in clinical trial testing due to lack of efficacy1,4.
To address this challenge, researchers have developed high-throughput approaches leveraging human genetic data to identify effective repurposing candidates5,6. These methods are supported by the finding that drugs are more likely to pass clinical trials if their targets overlap with hits from human genetics studies7,8. An emerging approach using human genetic data to identify repurposing candidates is based on the hypothesis that a drug that reverses the molecular state of disease would also be an effective treatment for the disease9,10. To represent the molecular state of a disease, this approach calculates a gene expression signature using summary statistics from a genome-wide association study (GWAS) for the disease11,12. The disease gene expression signature is then used to search for drugs that reverse the disease-associated gene expression changes13,14. While these studies generate many repurposing signals, it remains a challenge to determine which of the repurposing candidates have the highest likelihood of passing clinical trials with commonly used validation methods (Supplementary Fig. 1).
To validate drug repurposing candidates, researchers commonly use animal models and in vitro assays, but these methods have two major limitations. First, these validation tools are sub-optimal representations of human disease, so evidence generated using these tools often serve as unreliable predictors for drug response in humans. There are instances of repurposing candidates that were effective in animal models10 but subsequently failed to work in humans4,15. A second limitation is that using these methods to test most of the repurposing candidates identified from human genetic data is both time- and cost-prohibitive (e.g., a recent study identified 210 drug repurposing candidates for hypertension14), so researchers can only test a handful of repurposing candidates. Consequently, among the repurposing candidates not tested, there may be a drug that is effective at treating the disease of interest. In contrast, generating reliable evidence to predict drug response in humans for many repurposing candidates can be done quickly and cost-effectively using clinical data from electronic health records (EHRs)16,17.
Here, we describe a proof-of-concept approach integrating imputed human disease gene expression signatures, drug perturbation data, and clinical EHR data to identify and validate repurposing candidates. The four major steps of this approach are (1) imputing human disease gene expression signatures using S-PrediXcan11,12 and GWAS summary statistics, (2) searching for drugs that reverse the disease gene expression signatures in drug perturbation databases using the Integrative Library of Integrated Network-based Cellular Signatures (iLINCS) platform18,19, (3) validating iLINCS repurposing candidates, using clinical data stored in the Synthetic Derivative (SD), the de-identified EHR database at Vanderbilt University Medical Center (VUMC), and (4) replicating repurposing candidate signals using clinical data stored in the National Institutes of Health (NIH) All of Us Research Program database (Fig. 1a)20,21. We applied this approach to find repurposing candidates for two diseases, hyperlipidemia and hypertension. We chose these diseases to test this proof-of-concept approach because they have several known US Food and Drug Administration (FDA)-approved drugs and robust biomarkers to measure drug efficacy. The data used in this study, except for individual-level clinical data in the VUMC SD, are all stored in publicly available databases. We have also made the software tools available in open source for researchers to apply this drug repurposing approach for their diseases of interest.
Results
Using gene expression to find drug repurposing candidates
We developed a novel approach integrating disease gene expression signatures, drug perturbation data, and clinical data, to identify and validate drug repurposing candidates. To compute the gene expression signature for each disease, we searched a public database with disease-associated gene expression changes22,23. These disease-associated gene expression changes were imputed using each disease’s GWAS summary statistics24,25 and S-PrediXcan11,12. For both disease signatures, the direction of gene expression changes for known disease-associated genes was concordant with existing knowledge. For example, in the gene expression signature for hyperlipidemia, PCSK926 was upregulated and LDLR27 was downregulated (Supplementary Data 1), as expected. In the gene expression signature for hypertension, ADRB1 and ACE were both upregulated (Supplementary Data 1), as expected. We then uploaded each disease’s gene expression signatures to iLINCS (Supplementary Data 2–5). In iLINCS, we found 149 and 178 drugs with perturbation signatures that reversed the disease gene expression signatures for hyperlipidemia and hypertension, respectively (Fig. 1b and Supplementary Data 6 and 7).
Validating drug repurposing candidates with clinical data
Next, we performed clinical validation studies to test the ability of the signature-based approach to rediscover known approved drugs and to identify new candidate drugs not currently approved for treating the diseases of interest. We performed these validation studies using clinical data stored in the VUMC SD28, which contained de-identified EHRs for >3.2 million individuals at the time of the study. We tested prescription drugs with at least twenty individuals in the clinical validation cohort (Supplementary Fig. 2) using a self-controlled case series (SCCS) study design29 (Fig. 2a). Consider, for example, the clinical validation study of valproate as a repurposing candidate for hyperlipidemia. In this experiment, we measured the change in low-density lipoprotein cholesterol (LDL-C) levels due to valproate exposure in the outpatient setting. For each individual, we defined an observation period composed of two parts, a baseline period (before valproate exposure) and a treatment period (after valproate exposure). The baseline and treatment periods were divided by the index date, defined as the first date each individual was exposed to valproate. We calculated the outpatient median LDL-C measurements for both baseline and treatment periods, respectively. To adjust for potential confounding by indication, we excluded individuals who were exposed to any known FDA-approved lipid-lowering drugs during the observation period (Fig. 2b). To determine whether individuals experienced statistically significant reductions in LDL-C after valproate exposure, we used a linear mixed model.
For the hyperlipidemia clinical validation study, we quantified the effects of 84 drugs on LDL-C levels. In this analysis, we removed individuals who were exposed to other known FDA-approved lipid-lowering drugs during the observation period (Fig. 2b and Supplementary Data 8). The sociodemographic characteristics and comorbidities of the individuals studied are shown in Supplementary Data 9–11, and LDL-C measurements during both baseline and treatment periods can be found in Supplementary Data 12. Out of the 84 drugs tested, 12 lowered LDL-C with P < 0.05 (Fig. 3a and Supplementary Data 13). Five of the repurposing signals were statins, the most commonly used FDA-approved lipid-lowering drugs: fluvastatin (LDL-C mg dL−1, point estimate [95% confidence interval (CI)] = −18.7 [−23.5, −13.9], P = 3.50 × 10−12), pravastatin (−21.1 [−22.4, −19.9], P < 2.20 × 10−16), lovastatin (−24.8 [−26.9, −22.8], P < 2.20 × 10−16), simvastatin (−30.5 [−31.4, −29.6], P < 2.20 × 10−16), and atorvastatin (−34.8 [−35.7, −33.9], P < 2.20 × 10−16). The other seven signals were drugs FDA-approved for other diseases: acetaminophen (LDL-C mg dL−1, point estimate [95% CI] = −1.12 [−1.83, −0.41], P = 1.85 × 10−3), methocarbamol (−3.18 [−6.16, −0.20], P = 0.04), valproate (−4.71 [−9.21, −0.19], P = 0.04), risperidone (−4.93 [−9.54, −0.32], P = 0.04), digoxin (−6.21 [−12.0, −0.45], P = 0.04), gentamicin (−6.97 [−13.2, −0.74], P = 0.03), and tamoxifen (−11.4 [−17.6, −5.32], P = 4.48 × 10−4). Among the 12 drugs, 6 lowered LDL-C with P values crossing the Bonferroni threshold (0.05/84 = 5.95 × 10−4), 5 of which were known drugs approved for treating hyperlipidemia, and one approved for treating other diseases (Table 1).
Table 1.
Source | Hyperlipidemia | Hypertension | ||
---|---|---|---|---|
Vanderbilt | ||||
Drug repurposing candidates tested | 84 | 94 | ||
Therapeutic effect & P < 0.05 | 12 | 23 | ||
Drugs approved for target disease | 5 | 5 | ||
Drugs approved for other diseases | 7 | 18 | ||
Therapeutic effect & P < Bonferroni | 6 | 12 | ||
Drugs approved for target disease | 5 | 4 | ||
Drugs approved for other diseases | 1 | 8 | ||
All of Us | ||||
Drug repurposing candidates tested | 12 | 22 | ||
Therapeutic effect & P < 0.05 | 5 | 6 | ||
Drugs approved for target disease | 4 | 2 | ||
Drugs approved for other diseases | 1 | 4 |
Therapeutic effect means that individuals experienced reductions in biomarker measurements (LDL-C for hyperlipidemia; SBP for hypertension) after exposure to the drug repurposing candidate.
Two-tailed P values were calculated using linear mixed models.
For the clinical validation studies at Vanderbilt, we report both the number of drugs with P < 0.05 and P values that pass Bonferroni significance to correct for multiple comparisons. For the replication studies in All of Us, we report the number of drugs with P < 0.05.
LDL-C low-density lipoprotein cholesterol, SBP systolic blood pressure.
For the hypertension clinical validation study, we quantified the effects of 94 drugs on systolic blood pressure (SBP). In this analysis, we removed individuals who were exposed to other known FDA-approved antihypertensive drugs during the observation period (Fig. 2b and Supplementary Data 8). The sociodemographic characteristics and comorbidities of the individuals studied are shown in Supplementary Data 9–11, and SBP measurements during both baseline and treatment periods can be found in Supplementary Data 12. Out of the 94 drugs tested, 23 lowered SBP with P < 0.05 (Fig. 3b and Supplementary Data 13). Five of the repurposing signals were known FDA-approved antihypertensive drugs: spironolactone (SBP mm Hg, point estimate [95% CI] = −1.41 [−1.76, −1.06], P = 2.02 × 10−14), carvedilol (−1.54 [−2.50, −0.58], P = 1.92 × 10−3), nadolol (−2.35 [−3.66, −1.04], P = 4.63 × 10−4), benazepril (−3.35 [−4.51, −2.19], P = 1.74 × 10−8), and amlodipine (−4.22 [−4.67, −3.77], P < 2.20 × 10−16). The other eighteen signals were drugs FDA-approved for other diseases: caffeine (SBP mm Hg, point estimate [95% CI] = (−0.23 [−0.45, −0.01], P = 0.03), levofloxacin (−0.27 [−0.52, −0.02], P = 0.04), fexofenadine (−0.29 [−0.51, −0.07], P = 8.61 × 10−3), fluoxetine (−0.44 [−0.71, −0.17], P = 1.28 × 10−3), celecoxib (−0.44 [−0.79, −0.09], P = 0.01), ipratropium (−0.48 [−0.93, −0.03], P = 0.04), sertraline (−0.56 [−0.76, −0.36], P = 1.20 × 10−8), estradiol (−0.65 [−0.89, −0.41], P = 8.26 × 10−8), escitalopram (−0.71 [−0.93, −0.49], P = 1.26 × 10−11), fluorouracil (−0.75 [−1.42, −0.08], P = 0.03), atorvastatin (−0.86 [−1.13, −0.59], P = 1.91 × 10−9), simvastatin (−0.93 [−1.22, −0.64], P = 1.69 × 10−10), dexamethasone (−0.93 [−1.11, −0.75], P < 2.20 × 10−16), phenytoin (−1.26 [−2.16, −0.36], P = 6.22 × 10−3), gemcitabine (−1.49 [−2.45, −0.53], P = 2.61 × 10−3), rosiglitazone (−1.56 [−2.85, −0.27], P = 0.02), docetaxel (−2.8 [−3.64, −1.96], P = 2.19 × 10−10), and doxorubicin (−3.5 [−4.11, −2.89], P = 2.20 × 10−16). Among the 23 drugs, 12 lowered SBP with P values crossing the Bonferroni threshold (0.05/94 = 5.32 × 10−4), 4 of which were known drugs approved for treating hypertension and eight drugs indicated for other diseases (Table 1).
External replication of clinical validation studies
To confirm the VUMC SD clinical validation findings, we performed external replication studies in the NIH All of Us Research Program database20. At the time of study, All of Us had EHRs for >236,000 individuals with diverse ancestries. We tested drugs with therapeutic effects (i.e., lowered LDL-C or SBP measurements at P < 0.05) in the VUMC SD clinical validation study. The sociodemographic characteristics and comorbidities for both hyperlipidemia and hypertension cohorts can be found in Supplementary Data 9–11. For hyperlipidemia, we tested twelve drugs and found that five lowered LDL-C at P < 0.05 (Fig. 4a and Supplementary Data 13). These drugs were pravastatin (LDL-C mg dL−1, point estimate [95% CI] = −15.4 [−17.9, −12.9], P < 2.20 × 10−16), tamoxifen (−15.5 [−21.5, −9.49], P = 7.27 × 10−6), lovastatin (−19.3 [−23.3, −15.3], P < 2.20 × 10−16), simvastatin (−27.0 [−28.7, −25.2], P < 2.20 × 10−16), and atorvastatin (−29.7 [−31.2, −28.3], P < 2.20 × 10−16). For hypertension, we analyzed 22 drugs and found that six drugs lowered SBP at P < 0.05 (Fig. 4b and Supplementary Data 13). These drugs were atorvastatin (SBP mm Hg, point estimate [95% CI] = −0.70 [−1.23, −0.17], P = 0.01), sertraline (−0.81 [−1.42, −0.20], P = 9.77 × 10−3), spironolactone (−1.76 [−3.09, −0.43], P = 9.98 × 10−3), docetaxel (−2.51 [−4.45, −0.57], P = 0.01), doxorubicin (−3.69 [−5.08, −2.30], P = 4.38 × 10−7), and amlodipine (−5.23 [−6.27, −4.19], P < 2.20 × 10−16). Though fewer drugs reduced biomarker measurements with P < 0.05, most drugs had treatment effects in the expected direction (i.e., negative point estimates) with 95% CIs that overlapped with the 95% CIs from the VUMC SD clinical validation study (Fig. 4).
Review of evidence to support novel repurposing candidates
We used multiple databases, the literature, and domain-expert review to confirm the treatment effects we observed for drugs indicated for other diseases, i.e., potential repurposing candidates. For hyperlipidemia, we found seven drugs, not approved for treating hyperlipidemia, which had statistically significant LDL-C lowering effects in the VUMC SD clinical validation study. For three of these drugs, we found evidence supporting their LDL-C lowering effects: tamoxifen30, digoxin31, and valproate32. On the other hand, we did not find existing evidence supporting the LDL-C lowering effects for four drugs: gentamicin, risperidone, methocarbamol, and acetaminophen (Table 2 and Supplementary Table 1). Since gentamicin is commonly prescribed in non-systemic forms (e.g., ophthalmic solutions and topical ointments), we conducted a post hoc analysis in the VUMC SD by excluding 23 individuals exposed to non-systemic forms of gentamicin. In this subgroup composed of only 56 individuals exposed to systemic forms of gentamicin, the drug no longer had a statistically significant effect on lowering LDL-C (point estimate [95% CI] = −5.06 [−13.4, 3.25] mg dL−1, P = 0.24).
Table 2.
Disease | Drug | Approved indication | Existing evidence supports therapeutic effect |
---|---|---|---|
Hyperlipidemia | Tamoxifen | Cancer | Yes30 |
Hyperlipidemia | Gentamicin | Bacterial infections | No |
Hyperlipidemia | Digoxin | Arrhythmias | Yes31 |
Hyperlipidemia | Risperidone | Schizophrenia | No40 |
Hyperlipidemia | Valproate | Seizure | Yes32 |
Hyperlipidemia | Methocarbamol | Muscle spasms | No |
Hyperlipidemia | Acetaminophen | Pain | No |
Hypertension | Caffeine | Fatigue | No |
Hypertension | Levofloxacin | Bacterial infections | Yes33 |
Hypertension | Doxorubicin | Cancer | No |
Hypertension | Docetaxel | Cancer | Yes34 |
Hypertension | Rosiglitazone | Type 2 Diabetes | Yes35 |
Hypertension | Gemcitabine | Cancer | No |
Hypertension | Phenytoin | Seizure | Yes36 |
Hypertension | Simvastatin | Hyperlipidemia | Yes37 |
Hypertension | Dexamethasone | Inflammation | No |
Hypertension | Atorvastatin | Hyperlipidemia | Yes38 |
Hypertension | Fluorouracil | Cancer | Yes34 |
Hypertension | Escitalopram | Depression | No41 |
Hypertension | Estradiol | Menopause | Yes39 |
Hypertension | Sertraline | Depression | No41 |
Hypertension | Ipratropium | Asthma | No |
Hypertension | Celecoxib | Pain | No |
Hypertension | Fluoxetine | Depression | No41 |
Hypertension | Fexofenadine | Allergic Rhinitis | No42 |
For hypertension, we found 18 drugs that were not approved for treating the disease, with statistically significant SBP lowering effects (Table 2 and Supplementary Table 2). For eight of these drugs, we found evidence to support their SBP lowering effects: levofloxacin33, docetaxel34, rosiglitazone35, phenytoin36, simvastatin37, atorvastatin38, fluorouracil34, and estradiol39.
Discussion
We developed an approach to identify and validate drug repurposing candidates, which integrates disease gene expression signatures, drug perturbation data, and clinical data. For both hyperlipidemia and hypertension, we replicated known FDA-approved drugs and identified existing drugs approved for other diseases that had statistically significant biomarker-lowering effects. A substantial number of these biomarker-lowering effects are supported by evidence from multiple databases, the literature, and domain-expert review. Finally, we externally replicated the clinical validation pipeline in the NIH All of Us Research Program database, in which we observed similar drug treatment effect sizes.
While statistically significant, the biomarker-lowering effects associated with repurposing candidate exposure are not clinically significant. The drug repurposing candidates should not be used in place of known approved drugs for treating hyperlipidemia and hypertension (Table 2 and Supplementary Tables 1 and 2). As expected, known approved drugs had much larger therapeutic effect sizes compared to drugs approved for other diseases. For instance, individuals exposed to simvastatin (a known lipid-lowering drug) experienced much larger reductions in LDL-C compared to individuals exposed to valproate (−30.49 mg dL−1 vs. −4.71 mg dL−1) (Fig. 3a and Supplementary Data 13). Rather, this study’s contribution is a proof-of-concept approach to identify and clinically validate drug repurposing candidates. While hyperlipidemia and hypertension have many safe and potent drugs, there are still human diseases without effective treatments. For many of these diseases, our approach has the potential to identify existing drugs that may be more effective than current therapies. For these challenging diseases, gene expression signatures can be computed with S-PrediXcan using GWAS summary statistics that are publicly available in the GWAS catalog43 and UK Biobank25. At the time of writing, there are GWAS summary statistics for 869 and 7221 unique human conditions in the GWAS catalog and UK Biobank, respectively.
Compared to existing methods to validate repurposing candidates, our approach’s first advantage is the ability to measure drug efficacy in humans at scale. Similar to previous studies16,17, our approach allowed us to test many drugs (84 and 94 for hyperlipidemia and hypertension, respectively) in human individuals, because it uses automated informatics software to extract, process, and analyze EHR data. The ability to measure the magnitude of treatment effect is important for designing clinical trials, as lack of efficacy commonly causes clinical trials to fail44. In addition, testing many candidates enabled us to detect both potential true- and false-positive repurposing candidates. An example of a false-positive is sorafenib for treating hypertension. Sorafenib, a drug indicated for hepatocellular carcinoma, was predicted to lower SBP, because its iLINCS perturbation data reversed the hypertension gene expression signature (Supplementary Data 7). In the VUMC SD study, however, sorafenib increased SBP (5.12 mm Hg, P = 0.04; Fig. 3b and Supplementary Data 13), a side effect that has been previously reported34. In contrast to our approach, using more common validation strategies, like animal models and in vitro assays9,10, to test a similar number of drugs would have been cost- and time-prohibitive.
Our approach’s second advantage is its ease of portability. Previous studies have developed approaches to validate repurposing candidates using EHR data16,17. However, replicating drug repurposing signals from one database in a second independent database is often labor- and time-intensive, requiring many changes to the analysis pipeline due to institution-specific models used for storing clinical data45. In contrast, we replicated the VUMC SD pipeline with minor changes (essentially just changes to database table names) in the All of Us database, in under one week. This fast replication was possible because both databases store clinical data using the same standardized format20, the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM)46.
Our approach’s last advantages are its ease of reproducibility and adaptability. Our analysis can be reproduced by other researchers because we used data from publicly available resources. The one exception are the individual-level clinical data stored in the VUMC SD. Importantly, researchers can easily reproduce the clinical validation studies in All of Us, because it uses a cloud computing infrastructure with data version control47. For their drug repurposing studies, researchers can adapt the software tools and computational notebooks that we have made publicly available (https://pwatrick.github.io/DrugRepurposingToolKit/).
Like other studies using observational clinical data, our approach has several limitations. Using observational data to measure treatment effects is challenging due to potential bias and confounding. In the clinical validation studies, we were particularly concerned about potential confounding by indication resulting in false-positive findings, i.e., a drug observed to reduce LDL-C does not truly reduce LDL-C. For instance, since individuals exposed to valproate experienced a statistically significant reduction in LDL-C (Fig. 3a), we infer that valproate lowers LDL-C. Another potential explanation for the LDL-C reduction is that many individuals taking valproate were also taking known lipid-lowering drugs, like statins. Recognizing this potential systematic error apriori, we excluded all individuals exposed to known lipid-lowering drugs during the observation period to reduce the risk of confounding by indication (Fig. 2b).
Another limitation shared by EHR-based studies is the fidelity of drug exposure data. Studies have shown that ~30–60% of individuals do not take preventative medications as prescribed48. One potential impact of this medication non-adherence is an underestimation of drug efficacy. However, we are encouraged by the replication of known drug effects in both databases that is consistent with efficacy rates reported in the literature. EHR-based studies can also be limited by information leakage, which may occur when individuals seek care from multiple providers who are not part of the study’s EHR system. For individuals whose medical records are fragmented, we do not have a completely accurate view of the individual’s health journey49. We reduce the effects of information leakage by requiring at least two outpatient visits with lab measurements within a span of 2 years.
In observational studies, another factor that can bias treatment effect estimates is that individuals are not randomly allocated to treatment groups, a common study design used in randomized clinical trials. To adjust for this potential bias, we used an SCCS study design (Fig. 2a), where individuals serve as their own controls, as it is believed to be robust to confounding29. We were able to use the SCCS design, as the two biomarkers we chose had efficacy measures (i.e., LDL-C and SBP measurements) that would be expected to occur soon after drug exposure. When future users apply our approach to validate repurposing candidates for diseases with delayed clinical endpoints (e.g., cancer and myocardial infarction), other approaches such as a retrospective cohort design may be more appropriate (Supplementary Table 3).
Looking forward, larger datasets from more diverse populations50 would enable researchers to uncover potential ancestry-selective drug effects. In this study, both the S-PrediXcan models and GWAS summary statistics were from cohorts composed primarily of European ancestry individuals. As a result, we may have missed drug repurposing candidates that would be effective in individuals of non-European ancestry. When genomic and clinical data from more diverse populations are made publicly available, our approach to identifying and validating drug repurposing candidates may improve. In the future, our approach can potentially be used to validate drug repurposing candidates for diseases with no effective treatments, like Alzheimer’s disease. In fact, while this manuscript was under review, a study was published that used EHRs to validate one drug repurposing candidate, bumetanide, for treating APOE4-related Alzheimer’s disease51.
In summary, we developed a high-throughput approach to identify drug repurposing candidates using gene expression signatures and to validate candidates using clinical EHR data. Our results suggest that the increasing amount of publicly available molecular and clinical data can be leveraged for drug repurposing studies.
Methods
This study was conducted under all relevant ethical regulations with approval from the Vanderbilt University Medical Center Institutional Review Board (#180455) under a waiver of informed consent. Patients were not directly contacted for the study.
Computation of disease gene expression signatures
We used disease gene expression signatures to represent the molecular state for the two diseases of interest, hyperlipidemia and hypertension (step 1 in Fig. 1a). Disease gene expression signatures were computed using the differentially expressed genes (DEGs) from individuals with the disease of interest compared to individuals without. To compute disease gene expression signatures, we used publicly available gene expression data52,53 imputed by S-PrediXcan;11,12 this method imputes genome-wide DEGs for a disease of interest using GWAS summary statistics for the disease of interest. S-PrediXcan was trained using data from the Genotype-Tissue Expression (GTEx) project54, which contains genotypes linked to RNA-seq data for 49 human tissues.
For hyperlipidemia, we computed the disease gene expression signatures using DEGs imputed using the whole blood elastic net model (Column “tissue” = “TW_Whole_Blood_Elastic_Net_0.5”)55 and GWAS summary statistics from the Global Lipids Genetics Consortium with 188,577 European ancestry individuals (Column “phenotype” = “GLGC_Mc_LDL”)24. We downloaded the hyperlipidemia DEGs file from “https://s3.amazonaws.com/imlab-open/Data/MetaXcan/results/metaxcan_results_database_v0.1.tar.gz”. For hypertension, we computed the disease gene expression signatures using DEGs imputed using an aggregate tissue model and GWAS summary statistics from a UK Biobank study with 340,159 European ancestry individuals (Column “phenotype” = “Systolic blood pressure, automated reading”)25,56. We downloaded the hypertension DEGs file (“smultixcan_4080_raw_ccn30.tsv.gz”) from “https://uchicago.box.com/shared/static/vket4ickq7qt3sj8dy3mv8zsr1our3xd.gz”.
Previous studies used various approaches to compute disease gene expression signatures57, and two of these approaches were used in this study. The first approach employed the widely used false discovery rate (FDR) metric with a cutoff of q < 0.0558 to compute the gene expression signatures for hyperlipidemia (Supplementary Data 3) and hypertension (Supplementary Data 5). The second approach was motivated by the algorithm used in So et al.13, as this was the first study to use S-PrediXcan imputed gene expression data to identify drug repurposing candidates. So et al. computed13 disease gene expression signatures using the K-most up- or downregulated genes, with K = 50, 100, 250, and 500. We selected the lower bound (K = 50), as we assumed that around 100 genes were sufficient to represent the molecular states for our diseases of interest. For hyperlipidemia, we ranked genes (by Z scores) from the most upregulated to the most downregulated genes. From this sorted list of DEGs, we computed the K = 50 hyperlipidemia gene expression signature by selecting the top fifty most up- and downregulated genes, for a total of 100 genes (Supplementary Data 2). For hypertension, we used expression values for genes that overlapped with those in the file, “suppl_table_S1-significant_gene_trait_associations.xlsx” (Column “trait” = “4080_raw-Systolic_blood_pressure_automated_reading”)52. The selected genes were predicted to be the most likely causal genes for SBP variation52. From this gene list, we computed the K = 50 hypertension gene expression signature, which was composed of 53 upregulated and 48 downregulated genes, for a total of 101 genes (Supplementary Data 4).
Validation of disease gene expression signatures
To evaluate the robustness of the disease gene expression signatures, we queried the Drug-Gene Interaction Database (DGIdb)59. The DGIdb query allowed us to examine whether the disease-associated gene expression changes predicted by S-PrediXcan agreed with apriori expectations. For example, we expected apriori that in hyperlipidemia’s gene expression signature, PCSK926 would be upregulated and LDLR27 would be downregulated.
Using gene expression to find drug repurposing candidates
Next, we searched for drugs that reversed the S-PrediXcan imputed disease gene expression signatures (step 2 in Fig. 1a). To accomplish this, we queried the iLINCS database18. iLINCS hosts gene expression data from drug perturbation experiments. These in vitro experiments use a variety of cell types including human cancer cell lines19 and primary rat hepatocytes60. At the time of the study, iLINCS contained expression measurements for 74,201 genes from perturbation experiments of 21,299 small molecules18.
For both hyperlipidemia and hypertension, we uploaded their disease gene expression signatures to the iLINCS web portal. We used the default parameters in iLINCS to identify promising drug repurposing candidates. We matched disease and drug-gene expression signatures using either a weighted Pearson correlation18 or moderated Z scores19. Promising drug repurposing candidates were those with perturbations that reversed the S-PrediXcan imputed disease gene expression signature (i.e., had a negative correlation coefficient or concordance value) with a P < 0.05 for hyperlipidemia and P < 0.001 for hypertension (Fig. 1b).
For hyperlipidemia, we obtained drug repurposing candidates from the DrugMatrix dataset. DrugMatrix contains DEGs values for ~13,000 genes60. We used this set of drugs for hyperlipidemia because it contained data from primary liver tissue, a major tissue for regulating LDL-C levels. For hypertension, we obtained drug repurposing candidates from the Library of Integrated Network-based Cellular Signatures (LINCS) chemical perturbagen experiments. The LINCS dataset contains drug-gene expression signatures from the L1000 project19, derived mainly from in vitro human cancer cell line experiments. For hypertension, we selected drugs from the LINCS data and not from DrugMatrix (as was done for hyperlipidemia), because the top-ranked drugs in DrugMatrix were identified using data from perturbation experiments that used tissues not known to be major participants in regulating blood pressure.
Both hyperlipidemia and hypertension had two lists of drug repurposing candidates, one list generated using the K = 50 gene expression signature and another generated using the FDR gene expression signature. The lists were combined to create one iLINCS drug repurposing candidate list for each disease (Supplementary Data 6–7).
Selecting drug candidates for clinical validation studies
From the iLINCS lists, we first mapped drug repurposing candidates to their bioactive ingredients in RxNorm. RxNorm is a standardized terminology linking drugs to concepts, which are unique terms that represent therapeutically equivalent medications61. Second, we excluded non-prescription drugs using the RxNorm CVF flag, 4096. Third, we excluded drugs with <20 individuals in the final cohort, both to ensure individual privacy (in the reporting of individual demographics) and for inadequate statistical power concerns (Supplementary Fig. 2).
Identifying known FDA-approved drugs for target diseases
To identify known FDA-approved drugs for hypertension and hyperlipidemia, we used the MEDication Indication high-precision subset (MEDI-HPS) knowledge base62. MEDI-HPS links drug ingredients to diseases represented as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes. To identify drug ingredients approved for treating hyperlipidemia, we used ICD-9-CM codes 272.0 “Pure hypercholesterolemia”, 272.2 “Mixed hyperlipidemia”, and 272.4 “Other and unspecified hyperlipidemia”. To identify drug ingredients approved for treating hypertension, we used the ICD-9-CM code, 401.9 “Hypertension NOS”. We then manually reviewed the drug lists and added drugs that were approved after MEDI-HPS was released (e.g., PCSK9 antibodies for hyperlipidemia).
Clinical validation: EHR database and cohort description
To validate the drug repurposing candidates identified by gene expression signature matching, we quantified their efficacy on treating the diseases of interest; the clinical validation studies were conducted in the VUMC SD28, a de-identified copy of VUMC’s EHR (step 3 in Fig. 1a). The SD has longitudinal clinical data for >3.2 million individuals including billing codes, lab values, and medication exposure information. The SD is organized using the OMOP CDM46. For this study, VUMC SD data between 1995 and 2021 were used.
We validated drug repurposing candidates using clinical EHR data with an SCCS study design29 (Fig. 2a). Using SCCS allowed us to reduce the potential for false-positive therapeutic effects due to confounder bias. We designed the SCCS study by creating an observation window with two periods: baseline and treatment. The index date was the first date of exposure to the drug repurposing candidate of interest. The baseline period started before the index date and ended on the index date, with a maximum length of one year. The treatment period began after the index date and ended on the last date of exposure to the drug repurposing candidate, with a minimum length of thirty days (an induction period) and a maximum length of 1 year.
For each drug repurposing candidate, we identified a cohort of adults (≥18 years and <90 years) who were exposed to the drug repurposing candidate in the outpatient setting (Fig. 2b). Individuals were excluded if they did not have one or more outpatient biomarker measurements for the disease of interest, during both baseline and treatment periods. Individuals were also excluded if they were exposed to known FDA-approved drugs for the disease of interest. However, if the drug repurposing candidate being tested was a known FDA-approved drug for the disease of interest, then individuals were kept in the final cohort if they were solely excluded due to exposure to the drug repurposing candidate being tested. For instance, individuals exposed to simvastatin (a known lipid-lowering drug) were excluded in the analysis to clinically validate valproate as a drug repurposing candidate for hyperlipidemia; however, the same simvastatin-exposed individuals were not excluded in the study to validate simvastatin as a drug repurposing candidate for hyperlipidemia.
For the clinical validation studies, we report demographic statistics stratified by drug repurposing candidates. For gender and ethnicity, reported statistics are counts and percent of subgroups. We suppressed values if there were less than twenty individuals in the subgroup due to individual privacy concerns (Supplementary Data 9). For age and Elixhauser comorbidity index63,64, reported statistics are median and interquartile range (IQR) in the baseline and treatment periods; P values are from Wilcoxon signed-rank tests to identify statistically significant differences between baseline and treatment periods, with P < 0.05 considered statistically significant (Supplementary Data 10). For each Elixhauser comorbidity, reported are the number of individuals and percent of cohort with the comorbidity of interest in the baseline and treatment periods; P values are from McNemar’s tests to identify statistically significant differences between baseline and treatment periods, with P < 0.05 considered statistically significant. Elixhauser comorbidity counts were computed using ICD-9-CM and/or ICD-10-CM codes extracted from the start of the observation period to the end of baseline and treatment periods, respectively. We removed Elixhauser comorbidity statistics if there were less than twenty individuals in the subgroup due to individual privacy concerns (Supplementary Data 11).
Clinical validation: biomarkers and drug efficacy
For hyperlipidemia, we clinically validated drug repurposing candidates using LDL-C as the biomarker. For the hypertension clinical validation study, we selected SBP as the biomarker. We chose to use LDL-C and SBP because they are measurements commonly collected for tracking disease progression and are important for predicting the risk of cardiovascular disease65. Further, we only used biomarker measurements taken in the outpatient setting, as inpatient biomarkers can be substantially altered by the acute disease processes related to inpatient admissions, and these altered biomarker measurements can confound the results of the clinical validation study.
We defined a drug repurposing candidate’s efficacy as the difference in median biomarker measurements taken before (baseline period) and after drug exposure (treatment period) (Fig. 2a). A repurposing candidate’s efficacy value was adjusted for confounding factors (see explanation of linear mixed model in the next section). We only used treatment period biomarker measurements taken after a 30-day induction period to allow each repurposing candidate to reach steady-state drug concentration. We removed median biomarker measurement outliers (defined as 1.5x interquartile range, outside the first and third quartiles) prior to statistical analysis. We used the magnitude of biomarker reduction to quantify a drug’s efficacy. For instance, in the hyperlipidemia study, drug A was more effective than drug B, if drug A-exposed individuals experienced larger reductions in LDL-C compared to drug B-exposed individuals.
Clinical validation: statistical analysis
In the clinical validation studies, the null hypothesis was that individuals exposed to the drug repurposing candidate did not experience changes in the biomarker between the baseline and treatment periods. The alternative hypothesis was that individuals exposed to the drug repurposing candidate experienced changes in their biomarkers between the baseline and treatment periods. For each biomarker, we report the mean and SD of the median measurements during both baseline and treatment periods (Supplementary Data 12).
To determine whether individuals exposed to a drug repurposing candidate experienced significant biomarker changes, we used a linear mixed model66,67. For each drug repurposing candidate, we report the treatment effect as a point estimate (i.e., mean difference between median biomarker measurements from the baseline and treatment periods) with 95% CI and associated P-value from the linear mixed model (Supplementary Data 13). The treatment effect estimates were adjusted for age, gender, ethnicity, and disease comorbidity as seen in the following linear mixed model equation:
1 |
In this model, the variables drugExposure, Age, Gender, Ethnicity, and Comorbidity are treated as fixed effects, with a random intercept for each individual. In this paired study, each individual appears twice, once for the baseline period and a second time for the treatment period. During the baseline period, the continuous response variable, biomarkerVal, is the median of the biomarker measurements collected during the baseline period; the binary variable, drugExposure is set to “0” indicating that the individual was not exposed to the drug; the continuous variable Age is the individual’s normalized age at the end of the baseline period; the binary variable Gender is set to “1” if the individual is female and “0” otherwise; the binary variable Ethnicity is set to “1” if the individual is not white and “0” otherwise; the continuous variable Comorbidity is the individual’s normalized Elixhauser comorbidity index computed using ICD-9-CM and/or ICD-10-CM codes entered in the individual’s medical record beginning from the start of the observation period to the end of the baseline period.
During the treatment period, biomarkerVal is the median of the biomarker measurements collected during the treatment period; drugExposure is set to “1” indicating that the individual was exposed to the drug; Age is the individual’s age at the end of the treatment period; Gender and Ethnicity are equal to the individual’s baseline values (i.e., are time-invariant variables); Comorbidity is the individual’s normalized Elixhauser comorbidity index computed using ICD-9-CM and/or ICD-10-CM codes entered in the individual’s medical record beginning from the start of the observation period to the end of the treatment period. Each drug’s treatment effect estimate and the associated P-value is represented by β1.
A drug was deemed to have a statistically significant therapeutic effect if it had a negative point estimate (i.e., β1 < 0; exposure resulted in lower biomarker measurements in the treatment period, compared to baseline) with P < 0.05. We report both the number of drugs with P < 0.05 and the number of drugs with P values that crossed Bonferroni correction (0.05/84 = 5.95 × 10-4 for hyperlipidemia; 0.05/94 = 5.32 × 10−4 for hypertension) to adjust for multiple testing.
External replication of clinical validation studies
To validate the findings from the VUMC SD, we performed external clinical validation studies using the NIH All of Us Research Program database20,21 (step 4 in Fig. 1a). The All of Us Research Program database is a unique resource with health data from a diverse group of participants, with >50% of participants as members of racial and ethnic minorities, and >80% from underrepresented groups in biomedical research. As of March 2021, the dataset contains >370,000 participants and EHRs for >236,000 participants with diverse backgrounds. Analyses were performed in the All of Us dataset v4, during the beta testing phase of the program, which began in May 202021. For this study, All of Us data between 1991–2020 were used. We tested all drugs with statistically significant therapeutic effects (i.e., decreased LDL-C or SBP measurements at P < 0.05) in the VUMC SD clinical validation studies.
Results reported are in compliance with the All of Us Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20 to protect participant privacy.
Review of evidence to support novel repurposing candidates
We used multiple databases (SIDER34, DEB236, and TWOSIDES68), the literature, and domain-expert review (S.N. and C.M.S.) to confirm the therapeutic effects in the VUMC SD clinical validation study, for drugs not FDA-approved for the diseases of interest. SIDER is a resource linking drugs to side effects, extracted from drug labels34. DEB236 is a resource linking drugs to their indications and side effects; it was derived from five publicly available sources including SIDER, MEDLINE, and DrugBank. TWOSIDES is a resource containing statistics for potential drug-drug interactions derived from the FDA adverse event reporting system68.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We would like to thank Robert Carroll for his helpful discussions on study design. We would like to thank David Sadowsky, Alexander Kumar, Raymond Wu, Cindy Gadd, and Vivian Siegel for reading drafts of this manuscript. P.W. was supported by grants from the National Institutes of Health, including T32GM007347, T15LM007450, and P50GM115305. This work was supported by grants from the National Institutes of Health, including R01LM010685 (J.C.D.), R01GM120523 (Q.F.), R01AG069900 (B.L. and W-Q.W.), R35GM131770 (C.M.S), R01HL133786 (W-Q.W.), and R01GM139891 (W-Q.W.). The dataset used for the analyses described was obtained from Vanderbilt University Medical Center’s resources, the Synthetic Derivative, which are supported by institutional funding and by the National Center for Advancing Translational Science grant number 2UL1 TR000445-06. The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers: 1 OT2 OD026549; 1 OT2 OD026554; 1 OT2 OD026557; 1 OT2 OD026556; 1 OT2 OD026550; 1 OT2 OD 026552; 1 OT2 OD026553; 1 OT2 OD026548; 1 OT2 OD026551; 1 OT2 OD026555; IAA #: AOD 16037; Federally Qualified Health Centers: HHSN 263201600085U; Data and Research Center: 5 U2C OD023196; Biobank: 1 U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: 1 U24 OD023163; Communications and Engagement: 3 OT2 OD023205; 3 OT2 OD023206; and Community Partners: 1 OT2 OD025277; 3 OT2 OD025315; 1 OT2 OD025337; 1 OT2 OD025276. The All of Us Research Program would not be possible without the partnership of its participants. J.C.D.‘s involvement was primarily as a prior employee of VUMC. His subsequent NIH effort was supported by the Intramural Research Program of the National Human Genome Research Institute, Grant HG200417-01.
Author contributions
P.W. conceived the idea, designed the study, acquired the data, carried out the analysis, interpreted the results, developed the software package, and drafted the manuscript. Q.F. contributed to study design, data analysis, and interpretation of results. V.E.K. contributed to data analysis and interpretation of results. S.D.N. contributed to study design, data analysis, and interpretation of results. Q.C. contributed to study design, data analysis, and interpretation of results. B.L. contributed to data analysis and interpretation of results. T.L.E. contributed to data analysis and interpretation of results. N.J.C. contributed to data analysis and interpretation of results. E.J.P. contributed to data analysis and interpretation of results. C.M.S. contributed to data analysis and interpretation of results. D.M.R. contributed to data analysis and interpretation of results. J.C.D. contributed to study design, data analysis, and interpretation of results; his involvement in this project was primarily as faculty at VUMC prior to joining the NIH. W-Q.W. conceived the idea, designed the study, acquired the data, carried out the analysis, interpreted the results, supervised the project, and drafted the manuscript. All authors contributed to the refinement of the manuscript and approved the final manuscript.
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work.
Data availability
The S-PrediXcan generated DEGs file for hyperlipidemia can be found at “https://s3.amazonaws.com/imlab-open/Data/MetaXcan/results/metaxcan_results_database_v0.1.tar.gz” and for hypertension can be found at “https://uchicago.box.com/shared/static/vket4ickq7qt3sj8dy3mv8zsr1our3xd.gz”.
All requests for SD data are reviewed by Vanderbilt University Medical Center to determine whether the request is subject to any intellectual property or confidentiality obligations. Data are available through restricted access for approved studies and researchers who agree to conditions of use, such as but not limited to securely storing data and only using it for approved purposes. Any such data and materials that are approved will be released via a Data Use Agreement. The initial request can be sent to the corresponding author, and the applicants will be contacted within two weeks.
De-identified data are available on the researcher workbench of the All of Us Research Program located at https://workbench.researchallofus.org. Our All of Us workspace can be shared to any All of Us researchers by contacting W-Q.W.
Links for databases and datasets used in this study: iLINCS: http://www.ilincs.org/ilincs/; SIDER: http://sideeffects.embl.de/; DEB2: https://www.vumc.org/cpm/deb2; TWOSIDES: https://github.com/tatonetti-lab/nsides-release; DGIdb: https://www.dgidb.org/; MEDI-HPS: https://www.vumc.org/wei-lab/medi; All of Us: https://www.researchallofus.org/.
Code availability
To obtain disease gene expression signatures, we used DEGs imputed using the MetaXcan python package (https://github.com/hakyimlab/MetaXcan). Hyperlipidemia disease gene expression signature was generated using S-PrediXcan from MetaXcan v0.5.0. Hypertension gene expression signature was generated using S-MultiXcan from MetaXcan v0.6.0.
Analyses were conducted using R version 4.0.5. R packages used were janitor_2.1.0, broom_0.7.9, vroom_1.5.4, forcats_0.5.1, stringr_1.4.0, dplyr_1.0.7, purrr_0.3.4, readr_2.0.0, tidyr_1.1.3, tibble_3.1.3, ggplot2_3.3.5, tidyverse_1.3.1, lubridate_1.7.10, glue_1.4.2, lme4_1.1-27.1, lmerTest_3.1-3, comorbidity_0.6.0.9000, ddiwas_0.1, and DrugRepurposingToolKit_0.2.1.
The software used to extract EHR data, data processing, and data analysis can be found at https://github.com/pwatrick/DrugRepurposingToolKit or 10.5281/zenodo.5747805. An example for matching disease and drug-gene expression signatures can be found at https://pwatrick.github.io/DrugRepurposingToolKit/articles/gene_expression_signature_matching_example.html. An example for performing a clinical validation study in the NIH All of Us Research Program database can be found at https://pwatrick.github.io/DrugRepurposingToolKit/articles/all_of_us_example.html. For data cleaning and processing, this package leverages datasets and functions from the ddiwas69 and comorbidity70 R packages.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-021-27751-1.
References
- 1.Pushpakom S, et al. Drug repurposing: progress, challenges and recommendations. Nat. Rev. Drug Discov. 2019;18:41–58. doi: 10.1038/nrd.2018.168. [DOI] [PubMed] [Google Scholar]
- 2.Protheroe A, Edwards JC, Simmons A, Maclennan K, Selby P. Remission of inflammatory arthropathy in association with anti-CD20 therapy for non-hodgkin’s lymphoma. Rheumatology. 1999;38:1150–1152. doi: 10.1093/rheumatology/38.11.1150. [DOI] [PubMed] [Google Scholar]
- 3.Ashburn TT, Thor KB. Drug repositioning: Identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 2004;3:673–683. doi: 10.1038/nrd1468. [DOI] [PubMed] [Google Scholar]
- 4.Cudkowicz ME, et al. Safety and efficacy of ceftriaxone for amyotrophic lateral sclerosis: a multi-stage, randomised, double-blind, placebo-controlled trial. Lancet Neurol. 2014;13:1083–1091. doi: 10.1016/S1474-4422(14)70222-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sanseau P, et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 2012;30:317–320. doi: 10.1038/nbt.2151. [DOI] [PubMed] [Google Scholar]
- 6.Okada Y, et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature. 2014;506:376–381. doi: 10.1038/nature12873. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nelson MR, et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 2015;47:856–860. doi: 10.1038/ng.3314. [DOI] [PubMed] [Google Scholar]
- 8.Diogo D, et al. Phenome-wide association studies across large population cohorts support drug target validation. Nat. Commun. 2018;9:4285. doi: 10.1038/s41467-018-06540-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sirota M, et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci. Transl. Med. 2011;3:96ra77. doi: 10.1126/scitranslmed.3001318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dudley JT, et al. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 2011;3:96ra76. doi: 10.1126/scitranslmed.3002648. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gamazon ER, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 2015;47:1091–1098. doi: 10.1038/ng.3367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Barbeira AN, et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 2018;9:1825. doi: 10.1038/s41467-018-03621-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.So H-C, et al. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nat. Neurosci. 2017;20:1342–1349. doi: 10.1038/nn.4618. [DOI] [PubMed] [Google Scholar]
- 14.Eales JM, et al. Uncovering genetic mechanisms of hypertension through multi-omic analysis of the kidney. Nat. Genet. 2021;53:630–637. doi: 10.1038/s41588-021-00835-w. [DOI] [PubMed] [Google Scholar]
- 15.Crockett SD, Schectman R, Stürmer T, Kappelman MD. Topiramate use does not reduce flares of inflammatory bowel disease. Dig. Dis. Sci. 2014;59:1535–1543. doi: 10.1007/s10620-014-3040-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wu Y, et al. Discovery of noncancer drug effects on survival in electronic health records of patients with cancer: A new paradigm for drug repurposing. JCO Clin. Cancer Inform. 2019;3:1–9. doi: 10.1200/CCI.19.00001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Xu H, et al. Validating drug repurposing signals using electronic health records: A case study of metformin associated with reduced cancer mortality. J. Am. Med. Inform. Assoc. 2015;22:179–191. doi: 10.1136/amiajnl-2014-002649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pilarczyk, M. et al. Connecting omics signatures of diseases, drugs, and mechanisms of actions with iLINCS. bioRxiv (2019) 10.1101/826271.
- 19.Subramanian A, et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell. 2017;171:1437–1452.e17. doi: 10.1016/j.cell.2017.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.All of Us Research Program Investigators. The ‘All of Us’ research program. New Engl. J. Med. 2019;381:668–676. doi: 10.1056/NEJMsr1809937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ramirez, A. H. et al. The All of Us research program: data quality, utility, and diversity. medRxiv (2020) 10.1101/2020.05.29.20116905. [DOI] [PMC free article] [PubMed]
- 22.Im, H. K. MetaXcan Results. https://s3.amazonaws.com/imlab-open/Data/MetaXcan/results/metaxcan_results_database_v0.1.tar.gz.
- 23.Im, H. K. S-PrediXcan Results. Diagnoses - Main ICD10: I10 Essential (Primary) Hypertension. https://uchicago.box.com/shared/static/6tdiyksvxcm2nxjiml14deqiz1r6kqp7.bz2.
- 24.Willer CJ, et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 2013;45:1274–1283. doi: 10.1038/ng.2797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Neale, B. M. Neale lab - UK biobank GWAS results. (2020). http://www.nealelab.is/uk-biobank/.
- 26.Cohen J, et al. Low LDL cholesterol in individuals of african descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 2005;37:161–165. doi: 10.1038/ng1509. [DOI] [PubMed] [Google Scholar]
- 27.Brown MS, Goldstein JL. A receptor-mediated pathway for cholesterol homeostasis. Science. 1986;232:34–47. doi: 10.1126/science.3513311. [DOI] [PubMed] [Google Scholar]
- 28.Roden DM, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 2008;84:362–369. doi: 10.1038/clpt.2008.89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Petersen I, Douglas I, Whitaker H. Self controlled case series methods: an alternative to standard epidemiological study designs. BMJ. 2016;354:i4515. doi: 10.1136/bmj.i4515. [DOI] [PubMed] [Google Scholar]
- 30.Dnistrian AM, Schwartz MK, Greenberg EJ, Smith CA, Schwartz DC. Effect of tamoxifen on serum cholesterol and lipoproteins during chemohormonal therapy. Clin. Chim. Acta. 1993;223:43–52. doi: 10.1016/0009-8981(93)90061-8. [DOI] [PubMed] [Google Scholar]
- 31.Shi H, et al. Digoxin reduces atherosclerosis in apolipoprotein e-deficient mice. Br. J. Pharmacol. 2016;173:1517–1528. doi: 10.1111/bph.13453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Eirís JM, et al. Effects of long‐term treatment with antiepileptic drugs on serum lipid levels in children with epilepsy. Neurology. 1995;45:1155–1157. doi: 10.1212/wnl.45.6.1155. [DOI] [PubMed] [Google Scholar]
- 33.LEVOFLOXACIN injection [package insert]. Lake forest, IL: Akorn, inc (Akorn, Inc., 2020). https://dailymed.nlm.nih.gov/dailymed/medguide.cfm?setid=4438fed2-7ef5-488f-baa8-39bc65768d1d.
- 34.Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016;44:D1075–D1079. doi: 10.1093/nar/gkv1075. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Negro R, Mangieri T, Dazzi D, Pezzarossa A, Hassan H. Rosiglitazone effects on blood pressure and metabolic parameters in nondipper diabetic patients. Diabetes Res. Clin. Pract. 2005;70:20–25. doi: 10.1016/j.diabres.2005.02.012. [DOI] [PubMed] [Google Scholar]
- 36.Smith, J. C. Adverse Drug Effect Detection For Clinical Decision Support (Vanderbilt University School of Medicine, 2016).
- 37.Correa V, Jr, et al. Blood pressure-lowering effect of simvastatin: a placebo-controlled randomized clinical trial with 24-h ambulatory blood pressure monitoring. J. Hum. Hypertens. 2014;28:62–67. doi: 10.1038/jhh.2013.35. [DOI] [PubMed] [Google Scholar]
- 38.Kanaki AI, et al. Low-dose atorvastatin reduces ambulatory blood pressure in patients with mild hypertension and hypercholesterolaemia: a double-blind, randomized, placebo-controlled study. J. Hum. Hypertens. 2012;26:577–584. doi: 10.1038/jhh.2011.80. [DOI] [PubMed] [Google Scholar]
- 39.Seely EW, Walsh BW, Gerhard MD, Williams GH. Estradiol with or without progesterone and ambulatory blood pressure in postmenopausal women. Hypertension. 1999;33:1190–1194. doi: 10.1161/01.hyp.33.5.1190. [DOI] [PubMed] [Google Scholar]
- 40.Newcomer JW. Second-Generation (atypical) antipsychotics and metabolic effects. CNS Drugs. 2005;19:1–93. doi: 10.2165/00023210-200519001-00001. [DOI] [PubMed] [Google Scholar]
- 41.Peixoto MF, Cesaretti M, Hood SD, Tavares A. Effects of SSRI medication on heart rate and blood pressure in individuals with hypertension and depression. Clin. Exp. Hypertens. 2019;41:428–433. doi: 10.1080/10641963.2018.1501058. [DOI] [PubMed] [Google Scholar]
- 42.Lockwood JM, Wilkins BW, Halliwill JR. H1 receptor-mediated vasodilatation contributes to postexercise hypotension. J. Physiol. 2005;563:633–642. doi: 10.1113/jphysiol.2004.080325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.MacArthur J, et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog) Nucleic Acids Res. 2017;45:D896–D901. doi: 10.1093/nar/gkw1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Hwang TJ, et al. Failure of investigational drugs in Late-Stage clinical development and publication of trial results. JAMA Intern. Med. 2016;176:1826–1833. doi: 10.1001/jamainternmed.2016.6008. [DOI] [PubMed] [Google Scholar]
- 45.Rosenbloom ST, Carroll RJ, Warner JL, Matheny ME, Denny JC. Representing knowledge consistently across health systems. Yearb. Med. Inform. 2017;26:139–147. doi: 10.15265/IY-2017-018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 2012;19:54–60. doi: 10.1136/amiajnl-2011-000376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Dudley JT, Butte AJ. In silico research in the era of cloud computing. Nat. Biotechnol. 2010;28:1181–1185. doi: 10.1038/nbt1110-1181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Brown MT, Bussell JK. Medication adherence: WHO cares? Mayo Clin. Proc. 2011;86:304–314. doi: 10.4065/mcp.2010.0575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wei W-Q, et al. Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. J. Am. Med. Inform. Assoc. 2012;19:219–224. doi: 10.1136/amiajnl-2011-000597. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Martin AR, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Taubes A, et al. Experimental and real-world evidence supporting the computational repurposing of bumetanide for APOE4-related alzheimer’s disease. Nature. Aging. 2021;1:932–947. doi: 10.1038/s43587-021-00122-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pividori, M. et al. PhenomeXcan: mapping the genome to the phenome through the transcriptome. Sci. Adv.6, eaba2083 (2020). [DOI] [PMC free article] [PubMed]
- 53.Barbeira AN, et al. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci. Genome Biol. 2021;22:49. doi: 10.1186/s13059-020-02252-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.GTEx Consortium. The Genotype-Tissue expression (GTEx) project. Nat. Genet. 2013;45:580–585. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Im, H. K. Im Lab’s PredictDB Data Repository. http://predictdb.org/.
- 56.Sudlow C, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12:e1001779. doi: 10.1371/journal.pmed.1001779. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Keenan AB, et al. Connectivity mapping: Methods and applications. Annu. Rev. Biomed. Data Sci. 2019;2:69–92. [Google Scholar]
- 58.Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 1995;57:289–300. [Google Scholar]
- 59.Freshour SL, et al. Integration of the Drug-Gene interaction database (DGIdb 4.0) with open crowdsource efforts. Nucleic Acids Res. 2021;49:D1144–D1151. doi: 10.1093/nar/gkaa1084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Svoboda, D. L., Saddler, T. & Auerbach, S. S. An overview of national toxicology program’s toxicogenomic applications: DrugMatrix and ToxFX. Adv. Comput. Toxicol. 141–157 (2019) 10.1007/978-3-030-16443-0\_8.
- 61.Bodenreider O, Cornet R, Vreeman DJ. Recent developments in clinical terminologies - SNOMED CT, LOINC, and RxNorm. Yearb. Med. Inform. 2018;27:129–139. doi: 10.1055/s-0038-1667077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wei W-Q, et al. Development and evaluation of an ensemble resource linking medications to their indications. J. Am. Med. Inform. Assoc. 2013;20:954–961. doi: 10.1136/amiajnl-2012-001431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med. Care. 1998;36:8–27. doi: 10.1097/00005650-199801000-00004. [DOI] [PubMed] [Google Scholar]
- 64.Walraven C, van, Austin PC, Jennings A, Quan H, Forster AJ. A modification of the elixhauser comorbidity measures into a point system for hospital death using administrative data. Med. Care. 2009;47:626–633. doi: 10.1097/MLR.0b013e31819432e5. [DOI] [PubMed] [Google Scholar]
- 65.Kannel WB, Gordon T, Schwartz MJ. Systolic versus diastolic blood pressure and risk of coronary heart disease. The Framingham study. Am. J. Cardiol. 1971;27:335–346. doi: 10.1016/0002-9149(71)90428-0. [DOI] [PubMed] [Google Scholar]
- 66.Laird NM, Donnelly C, Ware JH. Review papers: longitudinal studies with continuous responses. Stat. Methods Med. Res. 1992;1:225–247. doi: 10.1177/096228029200100302. [DOI] [PubMed] [Google Scholar]
- 67.Ikramuddin S, et al. Lifestyle intervention and medical management with vs without Roux-en-Y gastric bypass and control of hemoglobin A1c, LDL cholesterol, and systolic blood pressure at 5 years in the diabetes surgery study. JAMA. 2018;319:266–278. doi: 10.1001/jama.2017.20813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Tatonetti NP, Ye PP, Daneshjou R, Altman RB. Data-driven prediction of drug effects and interactions. Sci. Transl. Med. 2012;4:125ra31. doi: 10.1126/scitranslmed.3003377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Wu P, et al. DDIWAS: High-throughput electronic health record-based screening of drug-drug interactions. J. Am. Med. Inform. Assoc. 2021;28:1421–1430. doi: 10.1093/jamia/ocab019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Gasparini A. Comorbidity: an R package for computing comorbidity scores. J. Open Source Softw. 2018;3:648. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The S-PrediXcan generated DEGs file for hyperlipidemia can be found at “https://s3.amazonaws.com/imlab-open/Data/MetaXcan/results/metaxcan_results_database_v0.1.tar.gz” and for hypertension can be found at “https://uchicago.box.com/shared/static/vket4ickq7qt3sj8dy3mv8zsr1our3xd.gz”.
All requests for SD data are reviewed by Vanderbilt University Medical Center to determine whether the request is subject to any intellectual property or confidentiality obligations. Data are available through restricted access for approved studies and researchers who agree to conditions of use, such as but not limited to securely storing data and only using it for approved purposes. Any such data and materials that are approved will be released via a Data Use Agreement. The initial request can be sent to the corresponding author, and the applicants will be contacted within two weeks.
De-identified data are available on the researcher workbench of the All of Us Research Program located at https://workbench.researchallofus.org. Our All of Us workspace can be shared to any All of Us researchers by contacting W-Q.W.
Links for databases and datasets used in this study: iLINCS: http://www.ilincs.org/ilincs/; SIDER: http://sideeffects.embl.de/; DEB2: https://www.vumc.org/cpm/deb2; TWOSIDES: https://github.com/tatonetti-lab/nsides-release; DGIdb: https://www.dgidb.org/; MEDI-HPS: https://www.vumc.org/wei-lab/medi; All of Us: https://www.researchallofus.org/.
To obtain disease gene expression signatures, we used DEGs imputed using the MetaXcan python package (https://github.com/hakyimlab/MetaXcan). Hyperlipidemia disease gene expression signature was generated using S-PrediXcan from MetaXcan v0.5.0. Hypertension gene expression signature was generated using S-MultiXcan from MetaXcan v0.6.0.
Analyses were conducted using R version 4.0.5. R packages used were janitor_2.1.0, broom_0.7.9, vroom_1.5.4, forcats_0.5.1, stringr_1.4.0, dplyr_1.0.7, purrr_0.3.4, readr_2.0.0, tidyr_1.1.3, tibble_3.1.3, ggplot2_3.3.5, tidyverse_1.3.1, lubridate_1.7.10, glue_1.4.2, lme4_1.1-27.1, lmerTest_3.1-3, comorbidity_0.6.0.9000, ddiwas_0.1, and DrugRepurposingToolKit_0.2.1.
The software used to extract EHR data, data processing, and data analysis can be found at https://github.com/pwatrick/DrugRepurposingToolKit or 10.5281/zenodo.5747805. An example for matching disease and drug-gene expression signatures can be found at https://pwatrick.github.io/DrugRepurposingToolKit/articles/gene_expression_signature_matching_example.html. An example for performing a clinical validation study in the NIH All of Us Research Program database can be found at https://pwatrick.github.io/DrugRepurposingToolKit/articles/all_of_us_example.html. For data cleaning and processing, this package leverages datasets and functions from the ddiwas69 and comorbidity70 R packages.