Table 1.
Data type | Large-scale research efforts | Utility and advantages | Major caveats |
---|---|---|---|
Genetic variation | Many GWAS consortia, 1000 Genomes, gnomAD and UK Biobank | Unbiased source of genetic basis of disease and direct inference of causality | At least one step removed from the phenotype |
Epigenetics | ENCODE and Roadmap Epigenomics Project | Functional impact and typically easy to infer causality | Not applicable for all phenotypes |
Gene expression | GTEx and GEUVADIS | Inexpensive assay for an intermediate step towards the phenotype | Not applicable for all phenotypes |
Proteomics and metabolomics | CPTAC, EDRN and Common Fund | Likely to be very close to the phenotype | Expensive and difficult to scale (proteomics) |
Microbiome | Human Microbiome Project | Likely to be very close to the phenotype and measures a combination of genetic and environmental influences | Combination of genetic and environmental influences makes it difficult to infer the direction of causality |
In this table, ‘phenotype’ refers to an organismal phenotype. CPTAC, Clinical Proteomic Tumour Analysis Consortium; EDRN, Early Detection Research Network; ENCODE, Encyclopedia of DNA Elements; GEUVADIS, Genetic European Variation in Health and Disease; gnomAD, Genome Aggregation Database; GTEx, Genotype–Tissue Expression; GWAS, genome-wide association study.