Table 1.
Dataset | Disease/phenotype | Data type | Ethnicity compositiona | Reference(s) | URL |
---|---|---|---|---|---|
ADNI-3 | Alzheimer’s disease | Genotype and image | Caucasian 86%, other 14% | 118, 119 | https://adni.loni.usc.edu/ |
GENIE | Cancers | Genomic variation | White 87%, Black or African American 6%, Asian 5%, other 2% | 120 | https://www.aacr.org/professionals/research/aacr-project-genie/ |
GTEx (v8) | Gene expression in normal tissues | Genotype and transcriptome | White 85%, African American 13%, Asian 1%, unknown 1% | 121, 122 | https://gtexportal.org/ |
GWAS | Various | Genotype and phenotype | European 88%, Asian 8%, African, African American or Afro-Caribbean 2%, Hispanic or Latin American 1%, other/mixed 1% | 16 | https://gwasdiversitymonitor.com/ |
Million Veteran Program | Various | Genotype and electronic health record | European 70%, African 19%, admixed American 9%, Asian 2% | 4, 123 | https://www.mvp.va.gov/pwa/ |
MIMIC-IV | Various | Electronic health record | White 77%, Black or African American 10%, Asian 3%, Hispanic/Latino 4%, other 6% | 62, 124, 125 | https://doi.org/10.13026/07hj-2a80 |
SHHS | Cardiovascular diseases related to sleep-disordered breathing | Electronic health record | White 86%, Black 9%, other 5% | 126, 127 | https://sleepdata.org/datasets/shhs |
TARGET | Pediatric cancers | Multiomics | White 80%, Black or African American 13%, Asian 5%, other 2% | NAb | https://ocg.cancer.gov/programs/target |
TCGA | Cancers | Multiomics | European ancestry 82%, African ancestry 6%, East Asian ancestry 6%, admixed ancestry 4%, other 2% | 33 | https://cancergenome.nih.gov/ |
UK Biobank | Various | Genotype, genome sequence, and electronic health record | European ancestry 95%, African ancestry 2%, Central/South American ancestry 2%, East Asian ancestry 1% | 128 | https://www.ukbiobank.ac.uk/ |
The original terms from the information sources are used. The percentages were calculated using the patients with known race/ethnicity/ancestry information (as of August 2022).
Ethnicity composition numbers for TARGET were derived from the NCI Genomic Data Commons (https://portal.gdc.cancer.gov/).
Abbreviations: ADNI, The Alzheimer’s Disease Neuroimaging Initiative; GENIE, Genomics Evidence Neoplasia Information Exchange; GTEx, The Genotype-Tissue Expression Project; GWAS, genome-wide association studies; MIMIC-IV, Medical Information Mart for Intensive Care, version IV; NA, not any; NCI, National Cancer Institute; SHHS, Sleep Heart Health Study; TARGET, Therapeutically Applicable Research to Generate Effective Treatments; TCGA, The Cancer Genome Atlas.