Skip to main content
. 2023 May 31;8(5):e10553. doi: 10.1002/btm2.10553


Introduction of multiple datasets.

Dataset name Dataset range Major use
Medical images datasets
TCIA Over 1.8 million multi‐modal images from 35,000+ subjects, 170+ collections Cancer detection, diagnosis, and treatment
MURA 14,000+ musculoskeletal x‐rays Classification of normal and abnormal bone images
ISIC 23,000+ images of skin lesions Skin lesions detection
ChestX‐ray8 100,000+ chest x‐rays Classification of eight common thoracic diseases
BraTS MRI images from 393 cases of glioma Brain tumor segmentation and recognition
COVID19‐CT 1000+ chest CT images of patients with confirmed COVID‐19 diagnosis COVID19 detection and diagnosis
Electronic health record (EHR) datasets
MIMIC‐III 40,000+ patients with demographic, clinical, and outcome data Patients' outcome prediction and diseases risks assessment
eICU 200,000+ ICU patient records Patients' survival prediction
UK Biobank 500,000+ individuals with demographic, lifestyle, and health data Develop methods for disease prevention, diagnosis, and treatment
Omics dataset
TCGA 11,000+ patients with cancer across 33 different cancer types Identify potential targets for new therapies, and develop predictive models for patient outcomes
PDB 170,000+ protein structures from organisms Prediction of protein structure, design new drugs and therapeutic agents
KEGG 22,000+ human genes, 600+ diseases, and associated molecular pathways Explore functional relationships between genes, proteins, and other molecules
HMDB 114,000+ metabolites' structures, functions, and associated diseases Identify potential biomarkers for diagnosis and treatment of various conditions

Abbreviations: BraTS, Brain Tumor Segmentation Challenge; eICU, eICU Collaborative Research Database; HMDB, Human Metabolome Database; ISIC, International Skin Imaging Collaboration; KEGG, Kyoto Encyclopedia of Genes and Genomes; MIMIC‐III, Medical Information Mart for Intensive Care III; MURA, Stanford's Musculoskeletal Radiographs; PDB, Protein Data Bank; TCGA, The Cancer Genome Atlas; TCIA, The Cancer Imaging Archive; UK Biobank, UK Biobank Imaging Study.