Table 2.
Data class, specific data, statistics and details of data sources used to generate DrugRepoBank
| Data class | Specific data | Statistics | Details | Data source | Ref | 
|---|---|---|---|---|---|
| Drug | Drug name, clinical status and drug identifiers | 49 652 small-molecule drugs including 2877 approved drugs | DrugBank ID, drug name, Drug clinical status, drug identifiers including CAS Number, InChIKey, InChI, Formula and SMILES | DrugBank | (18) | 
| Drug chemical structures | 29 733 structures | We use the PubChem Compound ID of drugs to search the drug chemical structures in MolView | PubChem | (19) | |
| Drug–target interaction | Drug protein targets | 880 945 drug–target interactions | Each drug–target pair records the mechanism of action (such as agonist, antagonist and so on) and activity (Kd, Ki or IC50). | DrugBank and TTD | (18, 20, 21) | 
| Drug-disease association | Drug indications and clinical status | 28 978 drug-indication associations, including 3620 ‘approved’ | Drug disease indications are encoded by ICD-11. The clinical status includes Phase 1, Phase 1/2, Phase 2, Phase 3, Approved, Terminated, Investigated, Discontinued in Phase 1, Discontinued in Phase 2, Discontinued in Phase 3, Patented, Withdraw from market, Preclinical, Discontinued in Preregistration and Clinical trial | TTD | (20, 21) | 
| Drug-side effects association | Drug-side effects association | 109 698 Drug-side effect associations | Information on marketed medicines and their recorded adverse drug reactions | SIDER | (22) | 
| Drug-pathway association | Drug-induced pathways | 243 pathways and 3888 drug-pathway associations | KEGG | (23) | |
| Target | Target name, sequence, target identifiers and other target information | 4221 targets, including 620 successful targets | Target name, target gene name, target type, synonyms, biochemical class, EC Number and sequence | TTD | (20, 21) | 
| Target 3D structure | 7082 structures of 2226 targets | We use PDB ID to search target 3D structures in https://molstar.org. | PDB | (29) | |
| Target-disease association | Target indications and clinical status | 11 268 target-indication associations, including 2679 ‘approved’ | Target disease indications are encoded by ICD-11. The clinical status includes Phase 1, Phase 1/2, Phase 2, Phase 3, Approved, Terminated, Investigated, Discontinued in Phase 1, Discontinued in Phase 2, Discontinued in Phase 3, Patented, Withdraw from market, Preclinical, Discontinued in Preregistration and Clinical trial | TTD | (20, 21) | 
| Target-pathway association | Target-involved KEGG pathways | 387 pathways and 8528 target-pathway associations | We provide target-involved KEGG pathways for Homo sapiens | KEGG | (23) | 
| Target-involved WiKi pathways | 516 pathways and 8149 target-pathway associations | WiKiPathways | (30) | ||
| Target-involved PathWhiz pathways | 98 pathways and 467 target-pathway associations | PathWhizPathway | (31) | ||
| Target-involved Reactome pathways | 577 pathways and 4332 target-pathway associations | REACTOME | (32) | ||
| Target-involved NetPath pathways | 25 pathways and 1106 target-pathway associations | NetPath | (33) | ||
| Target-involved PANTHER pathways | 124 pathways and 1786 target-pathway associations | PANTHER | (34) | ||
| Pathway | GO terms and annotations | 6700 GO terms, including 446 CC, 1151 MF and 5103 BP terms, and a total of 250 734 protein-GO term associations | GO terms across categories of cellular components (CC), molecular functions (MF) and biological process (BP) | Enrichr | (35) | 
| Disease signature | Diseases (Cancer) signature | 25 types of cancer | Breast Cancer (BRCA), Bladder Cancer (BLCA), Cervical Cancer (CESC), Bile Duct Cancer (CHOL), Colon Cancer (COAD), Colon and Rectal Cancer (COADREAD), Esophageal Cancer (ESCA), Head and Neck Cancer (HNSC), Kidney Chromophobe (KICH), Kidney Clear Cell Carcinoma (KIRC), Kidney Papillary Cell Carcinoma (KIRP), Liver Cancer (LIHC), Lung Adenocarcinoma (LUAD), Lung Cancer (LUNG), Lung Squamous Cell Carcinoma (LUSC), Pancreatic Cancer (PADD), Pheochromocytoma & Paraganglioma (PCPG), Prostate Cancer (PRAD), Rectal Cancer (READ), Sarcoma (SARC), Melanoma (SKCM), Stomach Cancer (STAD), Thyroid Cancer (THCA), Thymoma (THYM) and Endometrioid Cancer (UCEC) | TCGA | (36) | 
| Drug signature | Drug signature | 473 647 replicate-consensus signatures | We downloaded the level 5 data of L1000 (GCTx format) from the Gene Expression Omnibus (accession number: GSE92742), which contains 473 647 replicate-consensus signatures (RCSs) generated by the official data pre-processing pipeline. The level 5 data of L1000 have been normalized, and the LINCS team suggests their direct use without extra processing. Each RCS represents the moderated z-score value of 12 328 genes for one profile | L1000 | (37) | 
| Literature | Drug repositioning-related literature | 169 experimentally validated repositioned drugs from 134 valid literature | We extract important information such as old targets, new direct targets, new indirect targets, old diseases, new diseases, experiment evidence and supporting sentences from these articles | PubMed | (38) |