Skip to main content
. 2024 Jul 11;2024:baae051. doi: 10.1093/database/baae051

Table 2.

Data class, specific data, statistics and details of data sources used to generate DrugRepoBank

Data class Specific data Statistics Details Data source Ref
Drug Drug name, clinical status and drug identifiers 49 652 small-molecule drugs including 2877 approved drugs DrugBank ID, drug name, Drug clinical status, drug identifiers including CAS Number, InChIKey, InChI, Formula and SMILES DrugBank (18)
Drug chemical structures 29 733 structures We use the PubChem Compound ID of drugs to search the drug chemical structures in MolView PubChem (19)
Drug–target interaction Drug protein targets 880 945 drug–target interactions Each drug–target pair records the mechanism of action (such as agonist, antagonist and so on) and activity (Kd, Ki or IC50). DrugBank and TTD (18, 20, 21)
Drug-disease association Drug indications and clinical status 28 978 drug-indication associations, including 3620 ‘approved’ Drug disease indications are encoded by ICD-11. The clinical status includes Phase 1, Phase 1/2, Phase 2, Phase 3, Approved, Terminated, Investigated, Discontinued in Phase 1, Discontinued in Phase 2, Discontinued in Phase 3, Patented, Withdraw from market, Preclinical, Discontinued in Preregistration and Clinical trial TTD (20, 21)
Drug-side effects association Drug-side effects association 109 698 Drug-side effect associations Information on marketed medicines and their recorded adverse drug reactions SIDER (22)
Drug-pathway association Drug-induced pathways 243 pathways and 3888 drug-pathway associations KEGG (23)
Target Target name, sequence, target identifiers and other target information 4221 targets, including 620 successful targets Target name, target gene name, target type, synonyms, biochemical class, EC Number and sequence TTD (20, 21)
Target 3D structure 7082 structures of 2226 targets We use PDB ID to search target 3D structures in https://molstar.org. PDB (29)
Target-disease association Target indications and clinical status 11 268 target-indication associations, including 2679 ‘approved’ Target disease indications are encoded by ICD-11. The clinical status includes Phase 1, Phase 1/2, Phase 2, Phase 3, Approved, Terminated, Investigated, Discontinued in Phase 1, Discontinued in Phase 2, Discontinued in Phase 3, Patented, Withdraw from market, Preclinical, Discontinued in Preregistration and Clinical trial TTD (20, 21)
Target-pathway association Target-involved KEGG pathways 387 pathways and 8528 target-pathway associations We provide target-involved KEGG pathways for Homo sapiens KEGG (23)
Target-involved WiKi pathways 516 pathways and 8149 target-pathway associations WiKiPathways (30)
Target-involved PathWhiz pathways 98 pathways and 467 target-pathway associations PathWhizPathway (31)
Target-involved Reactome pathways 577 pathways and 4332 target-pathway associations REACTOME (32)
Target-involved NetPath pathways 25 pathways and 1106 target-pathway associations NetPath (33)
Target-involved PANTHER pathways 124 pathways and 1786 target-pathway associations PANTHER (34)
Pathway GO terms and annotations 6700 GO terms, including 446 CC, 1151 MF and 5103 BP terms, and a total of 250 734 protein-GO term associations GO terms across categories of cellular components (CC), molecular functions (MF) and biological process (BP) Enrichr (35)
Disease signature Diseases (Cancer) signature 25 types of cancer Breast Cancer (BRCA), Bladder Cancer (BLCA), Cervical Cancer (CESC), Bile Duct Cancer (CHOL), Colon Cancer (COAD), Colon and Rectal Cancer (COADREAD), Esophageal Cancer (ESCA), Head and Neck Cancer (HNSC), Kidney Chromophobe (KICH), Kidney Clear Cell Carcinoma (KIRC), Kidney Papillary Cell Carcinoma (KIRP), Liver Cancer (LIHC), Lung Adenocarcinoma (LUAD), Lung Cancer (LUNG), Lung Squamous Cell Carcinoma (LUSC), Pancreatic Cancer (PADD), Pheochromocytoma & Paraganglioma (PCPG), Prostate Cancer (PRAD), Rectal Cancer (READ), Sarcoma (SARC), Melanoma (SKCM), Stomach Cancer (STAD), Thyroid Cancer (THCA), Thymoma (THYM) and Endometrioid Cancer (UCEC) TCGA (36)
Drug signature Drug signature 473 647 replicate-consensus signatures We downloaded the level 5 data of L1000 (GCTx format) from the Gene Expression Omnibus (accession number: GSE92742), which contains 473 647 replicate-consensus signatures (RCSs) generated by the official data pre-processing pipeline. The level 5 data of L1000 have been normalized, and the LINCS team suggests their direct use without extra processing. Each RCS represents the moderated z-score value of 12 328 genes for one profile L1000 (37)
Literature Drug repositioning-related literature 169 experimentally validated repositioned drugs from 134 valid literature We extract important information such as old targets, new direct targets, new indirect targets, old diseases, new diseases, experiment evidence and supporting sentences from these articles PubMed (38)