Abstract
Background
In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets.
Results
We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting.
Conclusions
dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.
Keywords: Transcriptomics, Gene set enrichment analysis, Drug discovery
Background
Drug discovery and/or screening can be an expensive and time-consuming endeavor with traditional methods relying on testing in in vitro or in vivo models or via screening of organs and tissues using synthetic molecules [1]. In some cases, the expense of a traditional drug development approach may overwhelm the resources available and make cost–benefit discussions challenging when bringing a new therapeutic to market [2]. Therefore, developing cost-efficient in silico strategies to screen for drug candidates that may be efficacious in treating human disease may result in novel or repurpose-able therapies [3]. With the rise of integrated -omics technologies, phenotypic screening [4], network-based [5], and literature mining [6, 7], new approaches that take advantage of large data-driven methodologies are at the forefront of drug screening [2]. Taking advantage of available knowledge, we propose a transcriptomically-driven drug screening approach that utilizes enrichment methods to determine candidate therapeutics.
Discovery-based enrichment methods can be used for finding matching transcriptomic signatures of drug-disease comparisons [8]. One approach, referred to as the signature reversion principle, has been successful in diverse therapeutic settings [9–11]. It assumes that a drug induced gene expression signature will be correlated with change of the transcriptome in disease to a healthy or healthier state [2]. Our premise is that a negatively correlated gene profile of a drug-perturbated transcriptome can be exploited in in silico drug screening methodologies.
Gene set enrichment techniques are well established in providing biological context in -omics studies, particularly in transcriptomic studies where summarizing the overall biology of a particular contrast or linear model by pathways enhances interpretability. We have used gene set enrichment techniques in a variety of transcriptional studies that compare or contrast the human host response to infectious or chronic illness [12–14]. Of the various enrichment approaches [15], Gene Set Enrichment Analysis (GSEA), Database for Annotation, Visualization and Integrated Discovery (DAVID) and Gene Ontology (GO) are gold standards in pathway and gene set enrichment for transcriptomic analyses [16–19], but unfortunately, their direct application in drug screening may not be ideal due to the lack of incorporation of drug-gene modulatory information. While other popular approaches such as gene2drug, DSEA, sscMAP, L1000cds, and CMAP-native methods may include such information, they lack the statistical rigor of GSEA [20–24]: none perform error rate analysis, calculate score normalizations, provide enrichment driver genes, or are tailored for transcriptomic analyses.
By performing enrichment on disease-associated gene signatures while using drug perturbation defined gene sets, entire transcriptomes can be probed for potential drugs or therapeutics. We propose a modified version of GSEA, namely drug perturbation GSEA (dpGSEA), to perform a unique drug-defined gene set enrichment analysis for screening therapeutics downstream of transcriptomic or proteomic studies. We describe dpGSEA as an analysis tool that emphasizes enrichment of counteracting gene expression between drug-gene and disease-gene profiles and provides an easily interpretable set of statistics to determine effectiveness of screened drugs. By using proto-matrices to capture a-priori drug perturbated gene signatures rather than gene sets, we believe our approach is well suited for transcriptomic-based therapeutic screening and enrichment.
Methods
We provide a comparison between dpGSEA and related approaches in Fig. 1 and detailed definition, notation, framework, statistic, normalization and error rate notes in the Additional file 1: Methods.
dpGSEA gene set priors
An overview of the dpGSEA processing, including the proto-matrix is shown in Fig. 2. dpGSEA utilizes transcriptomic signatures of drug perturbated cell lines from Broad Institute’s connectivity map project (CMAP) and the library of integrated network-based cellular signatures (LINCS) projects to produce annotated gene sets rather than curated lists, like those from MSigDB [16, 24, 25]. These gene sets are organized into proto-matrices as defined by gene signature cutoffs of ranked top fold change or statistical significance. The proto-matrix itself contains information including genes acted on by a specific drug and the directionality in which it is influenced, that is, whether the drug induces up or down regulation of the gene.
To generate the proto-matrices, differential expression (DE) analysis using default LIMMA-voom parameters in Bioconductor was performed on the CMAP and LINCS data [26]. For each drug, a DE experiment was conducted using the correspondingly batched DMSO sample as controls, while remaining effects were linearly corrected. The resulting genes were ranked by fold change and statistical significance to generate a specific signature, that is, the top 10, 20 or 50 genes acted upon by a specific drug with the cell-line information retained (these are labeled as “Sig Rank 10” or “FC Rank 20”, etc. with the first label denoting fold change or significance and the last label denoting the number of top ranked genes).
dpGSEA scoring statistics
Similar to the approach of GSEA (see Additional file 1: Methods), we consider a list L of annotated genes rank-ordered by increasing , for . Our method detects an enrichment of high values of in the positive tail of gene set . This translates into finding evidence of a leading-edge subset in gene set , in which the values of are maximal:
-
The traditional Enrichment Score, denoted which is calculated for each gene set , as the maximum deviation from 0 of a weighted running sum, for , in the gene set , relative to its complement . Formally, our first gene-specific Global Test Statistic can be written as:
where .1 Where | | denotes the absolute value, denotes the maximum function with respect to gene index , is a parameter describing the weight of the tail in the random walk (see remarks below), and is the indicator function on whether the jth rank-ordered gene, belongs to gene set and is the inverse sign referring to the counter directionality for disease-gene and drug-gene, for .
- The Target Compatibility Score, denoted , which is calculated for each gene set , for , as the absolute distance between the point of maximum enrichment score and the point where the rank-ordered is minimal in absolute value, typically a zero fold-change or zero correlation gene index. This involves the computation of two gene indices: (1) the gene rank maximizer of the statistic (leading edge upper bound), denoted , and (2) the gene rank minimizer of the rank-ordered , denoted :
2
where and where and denote the maximizer and minimizer functions with respect to gene index and , respectively.
Normalization, significance, and error rate
Normalization places ES and TCS scores on respective comparable scales. A null distribution is created by gene label permutation of list L while retaining original gene label rank-ordering; this is performed for 1000 permutations. The normalization factor is the change of scale obtained by the mean of the scores generated by the permuted distributions, and the normalized score is then obtained by simply dividing the true score by this normalization factor. The significance of the true score is determined by the proportion of permuted scores that are greater than the true score, and our null hypothesis states that the true score is no different from those generated by random gene label permutation.
The multiple testing problem is addressed by our procedure carried out to control the False Discovery Rate (FDR). After a full experimental run of dpGSEA, the FDR is calculated by comparison of the proportion of all permuted null normalized scores for every drug screened greater than the specific score of a drug in question. This is performed for each ES and TCS respectively and is the approach utilized by GSEA.
Testing dpGSEA
We approached testing dpGSEA in a two-fold manner. (1) We determined if dpGSEA was able to positively identify a perturbated drug from an external DE experiment through positively correlated gene modulation, as opposed to the signature reversion principle. (2) We used dpGSEA as intended, an exploratory tool for drug screening, to determine if the therapeutics detected have biological or phenotypic relevance to a disease in question.
For the first case, we tested third party gene signatures, not those from CMAP or LINCS, derived from gastroenteropancreatic neuroendocrine tumor cells (GEPNTs) perturbated by fluvastatin, parbendazole (against drug-defined gene sets present and generated from CMAP), paclitaxel, rosiglitazone (against drug-defined gene sets present and generated from LINCS), and doxorubicin (against drug-defined gene sets present and generated from both CMAP and LINCS) (Gene Expression Omnibus (GEO) #GSE98894) [27]. Drug perturbation DE for GEPNTs was performed using LIMMA-voom and matching signatures were detected using dpGSEA.
For the second case, drug screening, we applied dpGSEA to our recent study of differential gene expression in CD4+T regulatory cells (Tregs) from immune responders (IR) and nonresponders (INR) to antiviral therapy in HIV-infected individuals (GEO #GSE106792) [28]. This study assessed HIV-infected individuals for their ability to reconstitute the CD4+T cell pool in response to antiretroviral treatment and what candidate mechanisms were behind poor clinical outcomes and greater risk for morbidity and mortality with respect to INR status. Mitochondrial Treg mechanisms were implicated to be the cause of the cell cycle halting [28]. We analyzed this dataset with dpGSEA to determine whether we could identify drugs that may take advantage of differentially expressed genes (DEGs) involved in mitochondrial dysfunction or immune function as a whole in INRs.
Results
Our case study results for detection of GEPNTs drug perturbations by dpGSEA that pass the FDR α = 0.05 threshold are shown in Table 1A and B for both ES and TCS, respectively, along with the specific proto-matrix used. It is worth mentioning that not every GEPNTs drug perturbation was positively identified by every proto-matrix by ES and TCS FDR thresholds, but we were able to positively identify all perturbations in most case with the exception of rosiglitazone by ES FDR and fluvastatin by TCS FDR. Paclitaxel perturbations were most frequently positively identified by both scores and primarily by significance-based LINCS proto-matrices, while other drugs varied in their positive findings with respect to the proto-matrix utilized.
Table 1.
Drug | ES | NES | ES p value | Genes | Proto-Matrix |
---|---|---|---|---|---|
A | |||||
paclitaxel_HT29 | 0.547 | 2.745 | 0.011 | SCNN1A, AKR1C3 | LINCS FC Rank 20 |
paclitaxel_HT29 | 0.742 | 2.863 | 0.014 | CFAP70, C4BPB | LINCS Sig Rank 20 |
parbendazole_PC3 | 0.627 | 2.722 | 0.014 | FSTL3, HIST1H2BG | CMAP FC Rank 20 |
fluvastatin_MCF7 | 0.295 | 2.509 | 0.016 | IFIT1, MSMO1, INSIG1, HMGCR, IDI1, HSD17B7 | CMAP FC Rank 50 |
fluvastatin_MCF7 | 0.377 | 2.515 | 0.016 | SQLE, INSIG1, IDI1, SLCO4C1, MAP1S, RTEL1, PPIF, MAFK | CMAP Sig Rank 50 |
paclitaxel_MCF7 | 0.864 | 3.076 | 0.016 | HSPB1 | LINCS Sig Rank 10 |
paclitaxel_HT29 | 0.855 | 3.043 | 0.019 | CFAP70, C4BPB | LINCS Sig Rank 10 |
doxorubicin_A375 | 0.529 | 2.588 | 0.020 | MYB, CCL20, SLC27A2, MX2 | LINCS FC Rank 20 |
paclitaxel_HELA | 0.369 | 2.349 | 0.042 | ABTB2, ZNF816, CASK | LINCS Sig Rank 50 |
paclitaxel_PC3 | 0.706 | 2.531 | 0.044 | SIK1, GPM6A | LINCS FC Rank 20 |
parbendazole_MCF7 | 0.848 | 3.051 | 0.044 | HIST1H2BG | CMAP Sig Rank 10 |
Drug | TCS | NTCS | TCS p value | Genes | Proto-Matrix |
---|---|---|---|---|---|
B | |||||
rosiglitazone_HELA | 0.999 | 1.362 | 0.010 | INSIG1 | LINCS Sig Rank 50 |
paclitaxel_MCF7 | 0.979 | 1.671 | 0.031 | HSPB1 | LINCS Sig Rank 10 |
parbendazole_MCF7 | 0.954 | 1.929 | 0.044 | HIST1H2BG | CMAP Sig Rank 10 |
paclitaxel_HA1E | 0.992 | 1.336 | 0.054 | HIST1H2BD | LINCS Sig Rank 50 |
paclitaxel_MCF7 | 0.981 | 1.500 | 0.056 | HSPB1 | LINCS Sig Rank 20 |
paclitaxel_HT29 | 0.962 | 1.641 | 0.066 | CFAP70, C4BPB | LINCS Sig Rank 10 |
paclitaxel_MCF7 | 0.982 | 1.370 | 0.088 | HSPB1 | LINCS Sig Rank 50 |
paclitaxel_HT29 | 0.964 | 1.474 | 0.101 | CFAP70, C4BPB | LINCS Sig Rank 20 |
parbendazole_PC3 | 0.964 | 1.419 | 0.113 | FSTL3, HIST1H2BG | CMAP FC Rank 20 |
paclitaxel_HELA | 0.949 | 1.504 | 0.117 | SIK1 | LINCS FC Rank 20 |
paclitaxel_PC3 | 0.915 | 1.562 | 0.146 | GAA | LINCS Sig Rank 10 |
doxorubicin_MCF7 | 0.849 | 1.717 | 0.150 | S100A2 | LINCS FC Rank 10 |
Each row represents a positively identified drug (rosiglitazone, fluvastatin, parbendazole, paclitaxel, or doxorubicin) by dpGSEA in GEPNTs perturbation versus GEPNTs DMSO control DE of our first test case. All findings shown pass an FDR threshold of α = 0.05 for ES in A and TCS in B. The leading-edge driver genes are listed in the “Genes” column and the specific proto-matrix the positive results were detected in are listed in the “Proto-Matrix” column. Positively identified paclitaxel perturbations were most frequent while other drugs were found in only some of the proto-matrices utilized
Table 2 shows the most statistically significant ES drug discoveries for the INR versus IR case study where mitochondrial and immunological associated drugs were found. Oseltamivir-carboxylate, the active metabolite of Tamiflu, an antiviral, prevents the release of progeny influenza virions while simultaneously modulating human sialidases which have been found to be localized in the mitochondria and involved in the regulation of cell apoptosis [29, 30]. Ibutilide, an antiarrhythmic, has been shown to inhibit endoplasmic reticulum and mitochondrial stress mechanisms [31]. These findings are consistent with the INR mitochondrial dysfunction while showing targetable transcription that may increase antiviral activity and/or prevent cell cycle disruption of Treg function. Notably, other drugs within statistical significance of 0.05, such as fibronil and telmesteine (p = 0.015, p = 0.017 respectively) have targets (CPT1A, and IDH2 respectively) suggested to be representative of fatty acid oxidation and energy production of mitochondrial dysfunction congruent with previous research [28].
Table 2.
Drug | NES | ES p value | NTCS | TCS p value | Gene |
---|---|---|---|---|---|
bendamustine_HT29 | 3.13 | < 0.001 | 0.88 | 0.359 | TMEM106B, TMEM135, ELOVL6, KCNJ2, LSM6, TMEM126B, TGIF1, TXNDC9, MAPKAPK5 |
oseltamivir-carboxylate_MCF7 | 3.12 | < 0.001 | 0.97 | 0.150 | RDH11, ETS1, YAF2, UBE4B, SFPQ, FRYL |
medrysone_PC3 | 3.00 | < 0.001 | 0.92 | 0.267 | FAM13B, BMPR1A, MYO7A, CAST, H2AFV, RABL6 |
luliconazole_HT29 | 3.42 | < 0.001 | 0.90 | 0.314 | RBM7, TMED7, NXT2, ATMIN, SUB1, NPTN, WASF2, CEP135, RDH14 |
doxofylline_MCF7 | 3.43 | < 0.001 | 0.88 | 0.347 | SUCLG2, TM9SF3, BRD7, RPL36, APPBP2, MRPL33, ARL5A, TGFB3, LRRC15, HSP90B1, RPS25 |
ibutilide_HELA | 3.23 | 0.001 | 0.91 | 0.294 | EDEM3, RAD21, RAB35, ADGRF1, CBX1, SCN2B |
cevimeline_HA1E | 2.88 | 0.001 | 0.84 | 0.421 | EDEM3, UBE4B, CTSD, LRRC15, ID2, SRSF11, TRIM21, NUCKS1, PARP1, SH2D3A |
chrysin_HA1E | 3.10 | 0.001 | 0.86 | 0.386 | EDEM3, PBRM1, GANAB, MRPL33, LARP4, NKRF, BMI1, NOC3L, CORO2A |
amtolmetin-guacil_MCF7 | 2.81 | 0.001 | 0.88 | 0.330 | EDEM3, ASB13, SCAMP1, TRAPPC6A, EP300, MFSD6, AZIN1, CIAO1 |
triamterene_HT29 | 3.04 | 0.001 | 0.95 | 0.188 | TMEM106B, UGDH, ELOVL5, BNIP1, ZBTB11, PIH1D1 |
Both normalized ES and normalized TCS along with their respective statistical significance are shown. The leading-edge genes are also displayed, indicating genes driving the enrichment scores. Note that the highest ranked result, oseltamivir-carboxylate, suppresses influenza virions production modulates human sialidases, known for mitochondrial involvement and regulation of cell apoptosis
When comparing dpGSEA with traditional GSEA, we find that the ranked ordering of paclitaxel perturbated cell lines are significantly different suggesting a substantial difference compared to our approach (Fig. 3). Wilcoxon signed rank tests are not significant (p > 0.90) when comparing dpGSEA with GSEA for both ES significance ranking and TCS significance ranking (Fig. 3e). Comparisons between ES and TCS rankings within dpGSEA and GSEA for comparable proto-matrices showed a maximum positional shift of 3 for perturbated cell line ranking (Top 50 Rank GSEA: MCF7 from 2nd to 5th) and most ranking shifts between ES and TCS within 2 spots (0 spot shift: 11; 1 shift: 9; 2shift: 5; 3shift: 1).
Figure 4 compares dpGSEA score and significance trends against those of the CMAP native and the gene2drug approaches. Two approaches were not included in our evaluation: the sscMAP approach is no longer available and the L1000cds approach does not provide a significance estimate. Here, we use the CMAP top 20 ranked significantly proto matrix as our a-priori signature for dpGSEA and correspondingly equivalent inputs (top 20 genes by significance) for other approaches. It should be noted that directionality is integrated into the dpGSEA enrichment approach producing only positive scores as reflected by the one-sided distribution of Fig. 4b. The ranked drugs that pass nominal GSEA-defined FDR, and Benjamini–Hochberg (BH)-defined FDR adjusted thresholds are also shown. Using the GSEA-FDR threshold [16], we find many drug screens that pass FDR = 0.05. This is in stark contrast to those approaches without inherent error rate analysis (Fig. 4c, d) that have few or no screened drugs that pass the BH-FDR threshold at the same level. Therefore, the dpGSEA screened drug results provide a richer and more reliable ranking of drugs for the clinician. In addition, it should be noted that this result is achieved by dpGSEA despite the fact that the GSEA-defined FDR procedure is inherently more conservative (less inductive of downward bias) than the BH-defined FDR procedures [32], especially in cases of lower α values (Additional file 2: Figure S1). Last but not the least, note the statistically significant findings of screened drugs (highlighted in green in Fig. 4) that pass a designated FDR significance threshold (0.05) unique to dpGSEA’s novel TCS statistic.
The distributions of scores and significance can be found in Additional file 3: Figure S2 and Additional file 4: Figure S3 which show trends between both normalized scores and their respective transformed p values along with leading edge gene set sizes. Each plot shows one completed run of dpGSEA with each drug and their respective scores and significance shown. We can see that, as expected, scores trend positively with significance, and that TCS significance tends to favor smaller sets of driver genes (R = 0.72) while ES does not (R = 0.03) in Additional file 4: Figures S3C and S3D. Furthermore, Additional file 4: Figure S3 shows a comparison between the positively identified fluvastatin perturbation for various proto-matrices. As a-priori signature sizes increase, we see fluvastatin migrate from lack of statistical significance to close and beyond TCS significance at p < 0.05 and ultimately to ES significance at p < 0.05. This may suggest that TCS is more capable of detecting enrichment for smaller gene set sizes.
Discussion
The accurate portrayal of disease-gene and drug-gene complementary expression was the impetus for the development of dpGSEA. There are two features of dpGSEA that underscore its novelty in comparison to GSEA and other approaches, namely our indicator function denoting complementary disease-gene and drug-gene expression and the utilization of drug-derived gene set priors that include drug-gene modulation information. GSEA, in its current state, is not capable of producing results that can be interpreted with directionality of modulation by gene set priors for enrichment. Indeed, MSigDB gene sets, for example, only contain gene membership information. In cases where enrichment does take modulation of expression within a gene set into account, such as those defined in the C6 and C7 collections, the representation of a single biologically-defined gene set is dichotomized into up and down regulated groups [16]. This is less than ideal as interpretation of enrichment must be contextualized with two scores and two significance levels, making biological interpretations difficult in cases where two sets of estimates may not be congruent. Further, our results differ from those generated by traditional GSEA as shown by notable changes in rank of paclitaxel perturbated cell lines in Fig. 3, suggesting that the signature reversion principle plays a role when it comes to directionally influenced enrichment. In addition, with respect to GSEA but unlike other methods, we report FDR results in dpGSEA analyses that reflect a combination of both sensitivity and specificity metrics. However, in order to generate more specific accuracy metrics results like specificity and sensitivity, a simulation study of joint true drug perturbation and true DE (i.e. where the truth would be known for both) would be required to allow us to compare our candidate drug end results.
When compared to other drug screening approaches shown in Fig. 1b, we uniquely use both degree of modulation, as represented by DE significance, and directionality for enrichment. Methods such as gene2drug and DSEA require less conventional inputs, which will allow for application beyond transcriptomics but requires users to query with a single gene, a set of pathways, or a set of drugs without considering directionality of modulation [20, 21]. Although these approaches are versatile, dpGSEA takes advantage of the statistics generated in a DE experiment, making it uniquely postured for tackling transcriptomic drug screening. CMAP-native, L1000cds, and sscMAP approaches consider directionality, but not DE significance, and instead use ordered lists or sets [22–24]. Furthermore, we retain important aspects pivotal in GSEA’s success, such as score normalization and true FDR analysis in our approach [22, 24]. When comparing results between dpGSEA, CMAP native, and gene2drug, we see our approach provides for a greater number of drug screens that pass error correction. Our intrinsically less conservative GSEA-defined measurement of error is more appropriate for drug screening compared to the BH procedure. In our and other test cases the BH procedure shows strong bias towards exclusion of possible positive drug screens as shown in Fig. 4c, d, especially in cases of screens with high statistical significance, where the bias is most substantial (Additional file 1: Figure S1). The BH procedure, and others like it, is insufficient in understanding false discovery in these drug screening approaches which calls for an inherent method such as the one we have applied. Furthermore, for exploratory screenings, a less conservative error analysis that maintains strict statistical rigor is ideal. As a result, dpGSEA is fundamentally different from the aforementioned approaches, and we believe it can be an effective tool for drug screening for transcriptomic DE experiments. Furthermore, our novel statistic, TCS, serves as an alternative to the traditional ES by emphasizing gene rank with a DE experiment rather than statistical significance. It provides for another valid means of screening and, as shown in Fig. 4, elucidates a substantial number of otherwise ignored, but possibly important and effective, drug screens. This allows future studies another avenue for justification of exploration for a specific drug or gene target of interest if ES significance is not met.
When testing dpGSEA we were able to positively identify drug perturbations of paclitaxel, parbendazole, doxorubicin, rosiglitazone, and fluvastatin in GEPNTs, but we want to emphasize that dpGSEA’s primary purpose is discovery screening rather than identification. Our identification testing is a proof-of-concept for how our approach, in theory, can effectively apply the signature reversion principle in enrichment and detect drug perturbation signals for an external data set. We believe our true use-case test of dpGSEA on INR versus IR DE where mitochondrial and immunological associated drugs were found, is more revealing of dpGSEA approach’s capabilities.
Our scores, analogous to traditional GSEA scores, are rigorously generated while adjusted for false discovery to ensure the best possible accuracy. With respect to analytical studies based on DE analysis, i.e. all transcriptomic enrichment approaches, inferences made by dpGSEA will rely upon the validity of the prior DE results generated for the first stage of the dpGSEA framework. In line with this point, a recent study supports the importance of ranking statistics in GSEA. As the authors state, “An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results.” [33] Hence, the important features of our approach include: (1) proto-matrices to capture more information, (2) a more accurate Local Test Statistic such as the Empirical Bayes Moderated Statistic estimated in implementations of Limma or edgeR packages, and (3) error rate control procedures such as FDR selection.
Conclusions
We contend that our disease-gene and drug-gene complementary expression underpins the novel basis for dpGSEA, as well as the robust statistics controlled by multiple testing correction and the leading-edge driver genes generated by our approach. dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation, and we recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.
Supplementary Information
Acknowledgements
We thank Dr. Scott Williams from Case Western Reserve University for his valuable input and manuscript critique. We also thank Dr. Rong Xu, Michael Cartwright and the Case Applied Functional Genomics Core staff for their helpful discussions.
Abbreviations
- BH
Benjamini–Hochberg
- CMAP
Connectivity map project
- DAVID
Database for Annotation, Visualization and Integrated Discovery
- DE
Differential expression
- DEGs
Differentially expressed genes
- dpGSEA
Drug perturbation gene set enrichment analysis
- ES
Enrichment score
- FDR
False Discovery Rate
- GEO
Gene expression omnibus
- GEPNTs
Gastroenteropancreatic neuroendocrine tumor cells
- GO
Gene Ontology
- GSEA
Gene set enrichment analysis
- IR
Immune responders
- INR
Immune nonresponders
- LINCS
Library of integrated network-based cellular signatures
- Tregs
T regulatory cells
- TCS
Target compatibility score
Authors’ contributions
MF designed the method, performed the computational analysis and validation, and wrote the manuscript. BR provided bioinformatic analysis. CMC provided design and biological validation. JD provided critical methods development, interpretation and manuscript writing. MJC was responsible for the overall project initiation and interpretation, manuscript writing, and funding of the project. All authors contributed to the manuscript and read and approved the final manuscript.
Funding
This study was supported by Grants from the National Institutes of Health (T32HL007567, effort for MF; P30AI036219, effort for BR, CMC and MJC; and P50AR070590 Sub-Project ID 6891, PI: MJC). The content of this study is the responsibility of the authors and does not necessarily represent the official views of the NIH.
Availability of data and materials
Data are available in a GitHub repository, https://github.com/sxf296/drug_targeting.
Ethics approval and consent to participate
No ethics approval was required for this study.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Jean-Eudes Dazard, Email: jxd101@case.edu.
Mark J. Cameron, Email: mark.cameron@case.edu
Supplementary Information
The online version contains supplementary material available at 10.1186/s12859-020-03929-0.
References
- 1.Dugger SA, Platt A, Goldstein DB. Drug development in the era of precision medicine. Nat Rev Drug Discov. 2018;17(3):183–196. doi: 10.1038/nrd.2017.226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Pushpakom S, et al. Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov. 2019;18(1):41–58. doi: 10.1038/nrd.2018.168. [DOI] [PubMed] [Google Scholar]
- 3.Breckenridge A, Jacob R. Overcoming the legal and regulatory barriers to drug repurposing. Nat Rev Drug Discov. 2019;18(1):1–2. doi: 10.1038/nrd.2018.92. [DOI] [PubMed] [Google Scholar]
- 4.Chen Y, Xu R. Drug repurposing for glioblastoma based on molecular subtypes. J Biomed Inform. 2016;64:131–138. doi: 10.1016/j.jbi.2016.09.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Keiser MJ, et al. Predicting new molecular targets for known drugs. Nature. 2009;462(7270):175–181. doi: 10.1038/nature08506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Xu R, Wang Q. Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing. BMC Bioinform. 2013;14:181. doi: 10.1186/1471-2105-14-181. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Andronis C, et al. Literature mining, ontologies and information visualization for drug repurposing. Brief Bioinform. 2011;12(4):357–368. doi: 10.1093/bib/bbr005. [DOI] [PubMed] [Google Scholar]
- 8.Dudley JT, Deshpande T, Butte AJ. Exploiting drug-disease relationships for computational drug repositioning. Brief Bioinform. 2011;12(4):303–311. doi: 10.1093/bib/bbr013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Wagner A, et al. Drugs that reverse disease transcriptomic signatures are more effective in a mouse model of dyslipidemia. Mol Syst Biol. 2015;11(3):791. doi: 10.15252/msb.20145486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kunkel SD, et al. mRNA expression signatures of human skeletal muscle atrophy identify a natural compound that increases muscle mass. Cell Metab. 2011;13(6):627–638. doi: 10.1016/j.cmet.2011.03.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Shin E, et al. Drug signature-based finding of additional clinical use of LC28-0126 for neutrophilic bronchial asthma. Sci Rep. 2015;5:17784. doi: 10.1038/srep17784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fourati S, et al. Integrated systems approach defines the antiviral pathways conferring protection by the RV144 HIV vaccine. Nat Commun. 2019;10(1):863. doi: 10.1038/s41467-019-08854-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mudd JC, et al. Hallmarks of primate lentiviral immunodeficiency infection recapitulate loss of innate lymphoid cells. Nat Commun. 2018;9(1):3967. doi: 10.1038/s41467-018-05528-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Veazey RS, et al. Prevention of SHIV transmission by topical IFN-beta treatment. Mucosal Immunol. 2016;9(6):1528–1536. doi: 10.1038/mi.2015.146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.da Huang W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13. doi: 10.1093/nar/gkn923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Subramanian A, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hanzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 2013;14:7. doi: 10.1186/1471-2105-14-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4(1):44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 19.Mi H, et al. PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. 2019;47(D1):D419–D426. doi: 10.1093/nar/gky1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Napolitano F, et al. gene2drug: a computational tool for pathway-based rational drug repositioning. Bioinformatics. 2018;34(9):1498–1505. doi: 10.1093/bioinformatics/btx800. [DOI] [PubMed] [Google Scholar]
- 21.Napolitano F, et al. Drug-set enrichment analysis: a novel tool to investigate drug mode of action. Bioinformatics. 2016;32(2):235–241. doi: 10.1093/bioinformatics/btv536. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang SD, Gant TW. sscMap: an extensible Java application for connecting small-molecule drugs using gene-expression signatures. BMC Bioinform. 2009;10:236. doi: 10.1186/1471-2105-10-236. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Duan Q, et al. L1000CDS(2): LINCS L1000 characteristic direction signatures search engine. NPJ Syst Biol Appl. 2016;2:1–12. doi: 10.1038/npjsba.2016.15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lamb J, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science. 2006;313(5795):1929–1935. doi: 10.1126/science.1132939. [DOI] [PubMed] [Google Scholar]
- 25.Subramanian A, et al. A next generation connectivity map: l1000 platform and the first 1,000,000 profiles. Cell. 2017;171(6):1437–1452. doi: 10.1016/j.cell.2017.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ritchie ME, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. doi: 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Alvarez MJ, et al. A precision oncology approach to the pharmacological targeting of mechanistic dependencies in neuroendocrine tumors. Nat Genet. 2018;50(7):979–989. doi: 10.1038/s41588-018-0138-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Younes SA, et al. Cycling CD4+ T cells in HIV-infected immune nonresponders have mitochondrial dysfunction. J Clin Invest. 2018;128(11):5083–5094. doi: 10.1172/JCI120245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yamaguchi K, et al. Evidence for mitochondrial localization of a novel human sialidase (NEU4) Biochem J. 2005;390(Pt 1):85–93. doi: 10.1042/BJ20050017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Hata K, et al. Limited inhibitory effects of oseltamivir and zanamivir on human sialidases. Antimicrob Agents Chemother. 2008;52(10):3484–3491. doi: 10.1128/AAC.00344-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Wang Y, et al. Ibutilide protects against cardiomyocytes injury via inhibiting endoplasmic reticulum and mitochondrial stress pathways. Heart Vessels. 2017;32(2):208–215. doi: 10.1007/s00380-016-0891-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc. 1995;57(Series B):289–300. [Google Scholar]
- 33.Zyla J, et al. Ranking metrics in gene set enrichment analysis: do they matter? BMC Bioinform. 2017;18(1):256. doi: 10.1186/s12859-017-1674-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data are available in a GitHub repository, https://github.com/sxf296/drug_targeting.