ABSTRACT
Long non-coding RNAs (lncRNAs) have been proven to be implicated in the pathogenesis of various diseases. Multiple studies have demonstrated that small molecule drugs can modify lncRNA expression, which suggests a promising therapy for human diseases. Here, we constructed a comprehensive query and analytical platform D-lnc to dissect the influence of drugs on lncRNA expression. Firstly, we manually curated the experimentally validated regulations of drugs on lncRNA expression and recorded 7,825 entries between 59 drugs and 7,538 lncRNAs across five species from nearly 1,000 published papers. Secondly, we comprehensively screened the Connectivity Map (cMap) and the Gene Expression Omnibus (GEO) databases to obtain the drug-perturbed gene expression profiles. Through probe re-annotation of microarray data, we identified 19,946 putative associations between 1,279 drugs and 129 lncRNAs in cMap and 36,210 entries between 115 drugs and 2,360 lncRNAs in GEO. Finally, we developed an online analytical platform to predict the potential acting drugs or modified lncRNAs based on user input lncRNA sequence or drug structure through computing the similarities of lncRNA sequences or drug structures. In a word, D-lnc provides a comprehensive platform to detect the modification of drugs on lncRNA expression, which would facilitate the development of lncRNA-targeted therapeutics. D-lnc is freely available at http://www.jianglab.cn/D-lnc/.
KEYWORDS: Drug, lncRNA, IncRNA-targeted therapy, gene expression profile, probe re-annotation
1. Introduction
Long non-coding RNA (lncRNA) is one type of functionally important non-coding RNA with length longer than 200 nucleotides and is closely associated with human complex diseases, such as cancers, heart failure and nervous system diseases. For instance, the lncRNA MALAT1, which is highly expressed in neurons, drives the pathogenesis of Parkinson’s disease [1]. Targeted therapy is the basis for precision medicine which is designed to treat disease by modifying unique molecular abnormalities [2]. Recently, many studies have demonstrated that small molecule drugs can regulate lncRNA expression, which indicates that targeting lncRNAs with small molecules is a novel therapy for human diseases. For example, aspirin can induce expression of lncRNA OLA1P2, thus affecting STAT3 signalling pathway activity in human colorectal cancer, and the up-regulated OLA1P2 promotes the anti-metastatic activity of aspirin [3].
Until now, diverse lncRNA-related databases have been constructed. For example, LNCipedia [4] and lncRNAdb [5] provided public resources for lncRNA sequence and annotation. lncRNASNP maintained resources about single nucleotide polymorphisms (SNPs) in lncRNAs [6]. NcDR provided experimentally validated and computationally predicted non-coding RNAs involved in drug resistance [7]. Several databases recording the associations between lncRNAs and diseases were also developed, such as lncDisease [8] and Lnc2Cancer [9]. Nevertheless, there are no resources addressing the influence of drugs on lncRNA expression. Researchers and clinicians thus face a challenging task to treat complex diseases by lncRNA-targeted therapeutics with little collective information.
Here, we offered a comprehensive query and analytical platform D-lnc to manage the experimentally validated and computationally predicted modification of drugs on lncRNA expression, which would be helpful for the development of lncRNA-targeted therapeutics.
2. Materials and methods
The D-lnc incorporates manually curated data resources and computationally predicted modification of drugs on lncRNA expression, as well as an online analytical module (Figure 1). The database is freely available at http://www.jianglab.cn/D-lnc/, with the aim of facilitating its use by lncRNA-targeted therapeutics researchers and scholars. In the following sections, we presented the blueprint of the modification of drugs on lncRNA expression, described the resource, and discussed its utility.
Figure 1.

The content and the interface of D-lnc. The left panel describes the resources of database content, which includes the published papers, the Connectivity Map Database (cMap) and the Gene Expression Omnibus Database (GEO). The right panel is the user interface of D-lnc. The Search, Analysis, Download and Submit modules provide flexible ways to access the database.
2.1. Datasets
Firstly, we extracted the associations of small molecule drugs and lncRNAs from PubMed using the keywords ‘drug and long non-coding RNA’, ‘small molecule and long non-coding RNA’, ‘drug and lncRNA’ and ‘small molecule and lncRNA’. Approximate 1,000 papers were obtained. After manually screening, we got 7,825 experimentally validated regulations of drugs on lncRNA expression, which included 59 small molecules and 7,538 lncRNAs across five species (Homo sapiens, Mus musculus, Rattus norvegicus, Zebrafish and Chironomus riparius) and were defined as Validated dataset. To provide more effective information, a variety of biological databases were integrated, such as Ensembl, U.S. Food and Drug Administration (FDA), PubChem and DrugBank. Each entry contained the detailed information about small molecule, lncRNA and their regulation, including species, approved by FDA or not, experimental methods, tissues or conditions, lncRNA expression pattern (up-regulated or down-regulated), evidences in the reference, PubMed ID and so on.
Secondly, we retrieved 6,100 and 937 drug-perturbed gene expression datasets from the Connectivity Map (cMap) and the Gene Expression Omnibus (GEO) Databases, respectively, which were detected by five microarray platforms (Affymetrix Human Exon, U95Av, U133A, U133B and U133 plus 2.0 arrays). Then, we re-annotated the probes to lncRNAs based on Du et al. method [10]. Differentially expressed (DE) lncRNAs after and before drug treatment were identified as drug-modified lncRNAs. Here, two-fold change (FC) was used for the identification of DE lncRNAs. In the end, we derived 19,946 entries including 1,279 drugs and 129 lncRNAs from the cMap dataset, and obtained 36,210 entries including 115 drugs and 2,360 lncRNAs from the GEO dataset.
2.2. Search module
We used the above obtained three datasets, named Validated dataset, cMap dataset and GEO dataset, to build the Search module. We separated the Search module into two parts: ‘Search Validated Dataset’ and ‘Search Predicted Datasets’. In the ‘Search Validated Dataset’ part, a drop-down box containing related drugs was provided. Users can select an interested drug to retrieve the associated lncRNAs. Users can also input an interested lncRNA name to retrieve the associated drugs. In the ‘Search Predicted Datasets’ part, users can search a desired lncRNA or small molecule in two predicted (cMap and GEO) datasets, which offered two options for input: lncRNA name and small molecule name. Here, the small molecule name supported drug name and its synonyms.
2.3. Analysis module
We firstly validated the hypothesis that lncRNAs with similar sequences tend to be regulated by same drugs (Fisher’s exact test, p-value = 3.239 × 10−15), and drugs with similar structures tend to regulate same lncRNAs (Fisher’s exact test, p-value = 4.587 × 10−3). Here, the BLAST program was used to measure lncRNAs with similar sequences. The lncRNAs with BLAST E-value < 10 were considered as the lncRNAs with similar sequences. The tanimoto coefficient algorithm was employed to calculate the structure similarities between drugs in the Validated dataset. The drugs with Tanimoto Identity (Sim) > 0.6 were considered as drugs with similar structures. Thus, an online Analysis module was constructed to predict the potential influence of drugs on lncRNA expression based on user input lncRNA sequence or drug structure. For the prediction of drugs that might modify input lncRNA, the BLAST program was implemented to compute the sequence similarities between the input lncRNA and the lncRNAs in the Validated dataset. The lncRNAs with BLAST E-value < 10 were retained, and the drugs associated with these lncRNAs in the Validated dataset were regarded as the putative drugs that could modify the input lncRNA. Similarly, the Tanimoto coefficient algorithm was employed to calculate the structure similarities between user input drug and the drugs in the Validated dataset. The drugs with Tanimoto Identity (Sim) > 0.6 were retained, and the lncRNAs modified by these drugs were predicted as the potential lncRNAs that might be regulated by the interested drug.
2.4. Database organization
D-lnc was developed using Struts 1.3.10, Java Server Pages (JSP) and ran under Ubuntu 14.04.5 system. The database worked well with major web browsers (e.g. Internet Explorer, Mozilla Firefox, Chrome, Safari). The MySQL v5.6.10 was used for data storage and ran on an Apache web server v6.0.48. The dynamic HTML pages were implemented using JSP and JavaScript and the dataset tables using the JQuery plugin DataTables v1.9.4 (http://datatables.net/). The database used java graphics and dynamic tables. BLAST v2.7.1 was used to compute the sequence similarities between the input lncRNA and the lncRNAs in the Validated dataset, and R v3.4.4 was used to install R package (Rchemcpp) to compute the structure similarities between the input drug and the drugs in the Validated dataset.
3. Results
3.1. Database statistic and analysis
According to our pipeline (Figure 1), we retrieved and identified the modification of drugs on lncRNA expression from three data sources. The outcome of this pipeline was freely accessible in the D-lnc.
Validated dataset: including 7,825 associations between 59 drugs and 7,538 lncRNAs.
cMap dataset: including 19,946 associations between 1,279 drugs and 129 lncRNAs.
GEO dataset: including 36,210 associations between 115 drugs and 2,360 lncRNAs.
Since the acquisition of the drug and lncRNA associations involved different experimental conditions, we assessed how consistent or specific associations were across these conditions. To get a consistent and systematic conclusion, we analysed the results of the cMap dataset, which comprised five cell lines (MCF7, PC3, HL60, SKMEL5 and ssMCF7) and several concentrations of perturbagen (drug doses). We then investigated whether the associations were cell line/drug dose-specific or commonly identified in multiple cell lines/drug doses.
For cell lines inspection, we found 570 drugs treated in at least three cell lines. Then the associated lncRNAs in each cell line were obtained. We counted the number of drug-lncRNA associations that were identified only in one cell line or multiple cell lines. As shown in (Figure 2a), most drug-lncRNA associations were cell line-specific, while only a few associations were presented in two or three cell lines. For drug doses inspection, we first extracted drugs with at least three doses in a chosen cell line. Then, we obtained the drug-lncRNA associations for these drugs in each dose. The number of drug-lncRNA associations presented in one or multiple drug doses is shown in (Figure 2b). Similarly, most drug-lncRNA associations were only presented in a specific drug dose, while only 40 associations were commonly identified in two or three doses. These results indicated that the modification of drugs on lncRNA expression depended on cell lines or drug doses.
Figure 2.

Systematic characteristics of drug modification on lncRNA expression. (A) Number of drug-lncRNA associations presented in one or multiple cell lines. (B) Number of drug-lncRNA associations presented in one or multiple drug doses. (C) Distribution of the number of drugs that modified different number of lncRNAs. (D) Distribution of the number of lncRNAs that were modified by different number of drugs. (E) Percentile distribution of DE lncRNAs in all DE genes.
In addition, we also calculated the distribution of the number of drugs that modified different number of lncRNAs (Figure 2c), and the distribution of the number of lncRNAs that were modified by different number of drugs (Figure 2d). The results showed that most drugs could modify only a few lncRNAs, while most lncRNAs could be modified by a small number of drugs.
We further detected whether drugs had a preference in modifying gene and lncRNA expression. For each instance in cMap, we first extracted all DE genes and ranked them based on the absolute value of log2FC. Then, the percentile of each DE lncRNA in all the ranked DE genes was calculated. Next, the percentile distribution of all DE lncRNAs was illustrated by density plot (Figure 2e). Intuitively, the percentile distribution approximated uniform distribution, which indicated that drugs had no preference in modifying gene and lncRNA expression.
3.2. Utilization
D-lnc furnished a user-friendly interface and offered two main modules, the Search module (Figure 3) and the Analysis module (Figure 4).
Figure 3.

Case study and workflow of using the ‘Search Validated Dataset’ part of the Search module. (i) Search by lncRNA: (a) The interface of the Search module with an example of lncRNA H19. (b) The search results of lncRNA H19, including both the basic information of the associated small molecules and the expression pattern of lncRNA H19. (c) The detailed information of the association between cisplatin and lncRNA H19. (ii) Search by Small Molecule: (d) The interface of the Search module with an example of small molecule Imatinib. (e) The search results of small molecule Imatinib. (f) The detailed information about the association between cisplatin and lncRNA UCA1.
Figure 4.

Case study and workflow of using the Analysis module. (i) Analysis by lncRNA: (a) The interface of the Analysis module with an example of lncRNA H19. The accepted sequence format is FASTA format. (b) The analysis results of lncRNA H19, including the lncRNAs that have a similar sequence with lncRNA H19. (c) When clicking the magnifying glass icon, the small molecules that potentially modify the expression of lncRNA H19 are presented. (ii) Analysis by Small Molecule: (d) The interface of the Analysis module with an example of small molecule tamoxifen. The accepted structure format is MOL format. (e) The analysis results of tamoxifen, including the small molecules that have a similar structure with tamoxifen. (f) When clicking the magnifying glass icon, the lncRNAs that potentially be modified by tamoxifen are presented.
The Search module allows users to obtain desired associations related to the interested lncRNA or drug. For example, the following steps will direct users to retrieve the experimentally validated small molecules modifying the expression of H19 in Homo sapiens. (i) select ‘Homo sapiens’ from the ‘Species’ pull-down menu in the ‘Search Validated Dataset’ part, (ii) input ‘H19’ in the ‘Search by lncRNA’ textbox, (iii) press ‘Search’ button (Figure 3a). Six associated small molecules and their PubChem IDs (CIDs), as well as synonyms, are presented, including cisplatin, methotrexate (MTX), paclitaxel (PTX) and so on. The expression patterns of lncRNA H19 modified (up-regulated or down-regulated) by the associated small molecules are also provided (Figure 3b). Through clicking the ‘more’ hyperlink, the detailed information about the entry will be presented, such as lncRNA ensemble ID, small molecule’s Drugbank ID, the experimental detection method, the reference information and so on (Figure 3c). In addition, an interested drug, such as Imatinib, can be selected from the pull-down box of the ‘Search by Drug’, and the associated lncRNAs could be retrieved from the Validated dataset (Figure 3d-f). Users can also search a desired lncRNA or small molecule existed in two predicted (cMap and GEO) datasets from the ‘Search Predicted Datasets’ part, which offered two options for input: lncRNA name and small molecule name. The Search module supports fuzzy keyword searching to return the closest possible matching records.
The Analysis module predicted the putative drug and lncRNA associations based on the input lncRNA sequence (FASTA format) or drug structure (MOL format). For example, the portal can easily execute the following analysis: if users want to predict which small molecules are potential to modify the expression of lncRNA H19, the following steps could be processed: (i) select ‘lncRNA’ from the ‘Analysis By’ pull-down menu, (ii) input the sequence (FASTA format) of lncRNA H19 into the ‘Analysis Content’ textbox, (iii) press ‘Analysis’ button (Figure 4a). The 29 lncRNAs that have high sequence similarity with lncRNA H19 are returned (BLAST E-value < 10), and the BLAST E-value, Score and Identity will also be provided (Figure 4b). Furthermore, when clicking the table row of one returned lncRNA, the detailed results of sequence alignment are presented (Figure 4b). When clicking the magnifying glass icon in each table row, the lncRNAs-associated modification drugs in the Validated dataset will be provided, which are deemed as potential modifying drugs to the input lncRNA (Figure 4c). Users can also predict potential small molecule-associated lncRNAs by selecting ‘small molecule’ from the ‘Analysis By’ pull-down menu and inputting drug structure (MOL format) into the ‘Analysis Content’ (Figure 4d). The small molecules that have high structure similarities with the input drugs (Tanimoto Identity (Sim) > 0.6) are returned. Furthermore, the detailed structures about the input small molecule and the hit small molecule are presented together when clicking the table row of one returned small molecule (Figure 4e). The returned small molecule-associated lncRNAs in the Validated dataset will be provided when clicking the magnifying glass icon in each table row, and these lncRNAs are deemed as potential input small molecule-associated lncRNAs (Figure 4f).
In addition, D-lnc also provides a ‘Submission’ page for users to submit new validated data for updating the database. If the records are approved by our review committee, they will be available in D-lnc. Finally, all data can be downloaded freely on the ‘Download’ page.
4. Discussions and concluding remark
Increasing evidences have proved that lncRNAs could serve as novel treatment targets, which promises a meaningful therapy for diseases [11,12]. With the increasing abundant information about the lncRNAs and the related modification drugs, it is necessary to archive the associations between lncRNAs and drugs for further analysis. In this presented study, we provided a comprehensive platform entitled D-lnc to query and analyse the influences of drugs on lncRNA expression, which would facilitate the development of lncRNA-targeted therapeutics. In the future, we will continuously curate and update the reference data. With the abundance of research data, D-lnc will incorporate more comprehensive information and become more powerful.
5. Availability and requirements
D-lnc is free for all users and can be accessed through the portal (http://www.jianglab.cn/D-lnc/). Users can access the database using any of the mainstream web browsers, including Firefox, Safari and Chrome.
Funding Statement
This work was supported by the Foundation for the National Natural Science Foundation of China [61571169, 61872183, 61801150], the Fundamental Research Funds for the Central Universities [NE2018101].
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- [1].Zhang QS, Wang ZH, Zhang JL, et al. Beta-asarone protects against MPTP-induced Parkinson’s disease via regulating long non-coding RNA MALAT1 and inhibiting alpha-synuclein protein expression. Biomed Pharmacother. 2016;83:153–159. [DOI] [PubMed] [Google Scholar]
- [2].Tsimberidou AM. Targeted therapy in cancer. Cancer Chemother Pharmacol. 2015;76:1113–1132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Guo H, Liu J, Ben Q, et al. The aspirin-induced long non-coding RNA OLA1P2 blocks phosphorylated STAT3 homodimer formation. Genome Biol. 2016;17:24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Volders PJ, Verheggen K, Menschaert G, et al. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res. 2015;43:4363–4364. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Quek XC, Thomson DW, Maag JL, et al. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015;43:D168–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Miao YR, Liu W, Zhang Q, et al. lncRNASNP2: an updated database of functional SNPs and mutations in human and mouse lncRNAs. Nucleic Acids Res. 2018;46:D276–D80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Dai E, Yang F, Wang J, et al. ncDR: a comprehensive resource of non-coding RNAs involved in drug resistance. Bioinformatics. 2017;33:4010–4011. [DOI] [PubMed] [Google Scholar]
- [8].Wang J, Ma R, Ma W, et al. LncDisease: a sequence based bioinformatics tool for predicting lncRNA-disease associations. Nucleic Acids Res. 2016;44:e90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Ning S, Zhang J, Wang P, et al. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 2016;44:D980–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [10].Du Z, Fei T, Verhaak RG, et al. Integrative genomic analyses reveal clinically relevant long noncoding RNAs in human cancer. Nat Struct Mol Biol. 2013;20:908–913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].De Leeneer K, Claes K. Non coding RNA Molecules as potential biomarkers in breast cancer. Adv Exp Med Biol. 2015;867:263–275. [DOI] [PubMed] [Google Scholar]
- [12].Tripathi MK, Doxtater K, Keramatnia F, et al. Role of lncRNAs in ovarian cancer: defining new biomarkers for therapeutic purposes. Drug Discov Today. 2018;23:1635–1643. [DOI] [PMC free article] [PubMed] [Google Scholar]
