Biobanks Linked to Electronic Health Records Accelerate Genomic Discovery

Dana C Crawford; John R Sedor

doi:10.1681/ASN.2021060836

editorial

. 2021 Aug;32(8):1828–1829. doi: 10.1681/ASN.2021060836

Biobanks Linked to Electronic Health Records Accelerate Genomic Discovery

Dana C Crawford ^1,^2,^✉, John R Sedor ^3,⁴

PMCID: PMC8455265 PMID: 34244324

Genome-wide association studies (GWAS), now celebrating a decade and a half of successful genomic discovery, are the main study design applied at the population scale to identify individual genetic variants, genes and genic regions, and pathways associated with complex human diseases and traits. In basic terms, a GWAS surveys the genome one genetic variant at a time for differences in genotype or allele frequencies when participants with the disease or outcome of interest (cases) are compared with those who do not have the disease or outcome of interest (controls).¹ Given a typical GWAS surveys or genotypes more than one million genetic variants directly and now more than 29 million through imputation, stringent statistical significance thresholds are applied to better assure an identified genotype-phenotype association is real and replicable. This study design and statistical approach has been and continues to be applied to a wide variety of common and now rare human diseases and traits, including CKD, nephrotic syndrome and creatinine to name a few related to kidney outcomes and function.

Although highly successful as evidenced by the number of identified genotype-phenotype associations available and searchable in the European Molecular Biology Lab-European Bioinformatics Institute (EMBL-EBI) GWAS Catalog, GWAS have several notable limitations, ranging from failure to include diverse populations to its limits in statistical power driven by sample size, small genetic effect sizes, and multiple statistical tests. GWAS are also, by design, constrained to the study of one phenotype at a time. Whether constricted by the bounds of the study design and data collection or artificially restricted by consortium agreements, the study of one phenotype at a time represents a missed opportunity to study the outcome of interest in its real world setting of correlated comorbidities and risk factors, all detectable through genomic discovery as possible pleiotropy using phenome-wide association studies (PheWAS).²

The ability to pivot from GWAS to PheWAS in a tour de force of genomic discovery is exemplified with the study in this issue of JASN by Khan et al. and the electronic Medical Records & Genomics (eMERGE) network.³ The eMERGE network, now in its fourth cycle, is a National Human Genome Research Institute–zsupported consortium of biobanks linked to electronic health records (EHRs). Established in 2007, the eMERGE network began as a proof-of-concept series of studies designed to demonstrate that research-grade variables could be mined from existing clinical data and used for genomic discovery.⁴ Each of the five original study sites developed rules-based algorithms based on International Classification of Diseases codes, laboratory values, and other structured and unstructured data available in the EHRs to identify cases and controls for their respective primary outcomes of interest. Although the linked DNA samples were genotyped for the study site-specific GWAS, the depth of clinical data available at each study site allow for multiple GWAS of different phenotypes across the network as well as a PheWAS. The later iterations of the eMERGE network have since worked to address major barriers in the translation of genomic findings into clinical care such as the development of clinical decision support that incorporates patient-specific genetic variation, genome sequence annotation for clinically actionable variants, and the potential clinical utility of polygenic risk scores.⁵

The GWAS and subsequent PheWAS of circulating levels of plasma C3 and C4 presented by Khan et al. showcase many of the strengths of having access to individual-level clinical and genomic data at scale. First, the combined dataset of >100,000 participants with genome-wide data imputed to a common reference panel enabled the efficient informatic mining of laboratory values to extract plasma C3 and C4 levels for the GWAS. The GWAS was the first for these traits to include African-descent populations, a testament of the ability of individual biobanks affiliated with the eMERGE network to contribute needed diversification of GWAS. Overall, Khan et al. identified independent common genetic variants that explained a proportion of the trait variability for both C3 and C4 levels albeit with different characteristics. That is, the GWAS-identified single nucleotide polymorphisms for C3 levels only explained 2% of the variability attributed to heritability (genetics), suggesting additional associated loci can be discovered with more diverse and larger sample sizes. In contrast, the C4 GWAS supplemented by C4 copy number variant (CNV) imputation suggests the majority of phenotypic variation can be explained in and around the C4 locus itself.

Second, the breadth and depth of phenotypic data linked to genotypic data enabled the genome scan for C3/C4 associated variants and an immediate phenome scan for their pleiotropic partners. Although PheWAS is not limited to biobanks linked to EHRs, the clinically collected data leveraged by the eMERGE network are potentially real-world representations of phenotypic relationships that are not easily surveyed or recapitulated in traditional epidemiologic study designs. Khan et al. performed the PheWAS both with individual C3 and C4 GWAS-identified variants as well as with polygenic risk scores (PRS) constructed from the C3 and C4 GWAS. Like most PheWAS, the present study confirmed published associations (e.g., C3 GWAS-identified CFH variant and age-related macular degeneration) or identified suspected genotype-phenotype relationships (e.g., an inverse relationship between C3 and C4 PRS and systemic lupus erythematosus and its complications) and highlighted potentially novel pleiotropic relationships (e.g., C3 PRS and obesity). Also, like other PheWAS, extensive follow-up will be required to better understand the potential impact of the known and unknown clinical biases on the observed signals of pleiotropy.

The GWAS-PheWAS pivot made possible by biobanks linked to EHRs accelerates genomic discovery and generates new hypotheses that are testable in independent studies. In the present study, Khan et al. speculate that low C4 gene dosage as represented by the imputed copy number variant may increase susceptibility to kidney-related outcomes triggered by the classic complement pathway of the human immune system. Large biobanks with accessible individual-level genome and clinical phenome data such as the UK BioBank,⁶ the Million Veteran Program,⁷ and, in the near future, All of Us,⁸ all but guarantee that additional hypotheses will be generated for outcomes not available or easily studied using traditional epidemiologic designs. Further, efforts such as the Kidney Precision Medicine Project,⁹ which aims to generate a kidney tissue atlas for acute kidney injury and chronic kidney diseases, will provide the functional data necessary to interpret the statistical associations from genomic discovery in proper disease and organ-specific context.

Disclosures

J.R. Sedor reports consultancy agreements with Goldfinch Bio, Maze, and Sanofi Genzyme; research funding from Calliditas, Goldfinch Bio, and Novartis for clinical trials; honoraria from Chugai Pharmaceutical Co (Next Generation Kidney Research Meeting, Tokyo), Drexel University, Einstein, Maze, NKF Arizona, and University of Maryland; patents and inventions with APOL1 transgenic mice licensed to Sanofi Genzyme, and invention disclosure for machine learning analysis of kidney biopsies Kidney Foundation of Ohio (kidney patient organization for direct aid) Board of Directors, NephCure Kidney International; and reports other interests/relationships with editorial boards: Seminars in Nephrology, JASN, American Journal of Nephrology; and reports being an ISN member. D.C. Crawford reports consultancy agreements with Merck Research Labs; honoraria from National Institutes of Health, Merck Research Labs, Icahn School of Medicine at Mount Sinai, American Society of Microbiology, American Diabetes Association, University of Florida, and University of Cincinnati; and scientific advisor or membership with PLoS One, Frontiers in Genetics, and American Society of Human Genetics.

Funding

None.

Footnotes

Published online ahead of print. Publication date available at www.jasn.org.

See related original article, “Medical records-based genetic studies of the complement system” on pages 2031–2047.

References

1.Bush WS, Moore JH: Chapter 11: Genome-wide association studies. PLOS Comput Biol 8: e1002822, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Bush WS, Oetjens MT, Crawford DC: Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 17: 129–145, 2016 [DOI] [PubMed] [Google Scholar]
3.Khan A, Shang N, Petukhova L, Zhang J, Shen Y, Hebbring SJ, et al. : Medical records-based genetic studies of the complement system. J Am Soc Nephrol 32: 2031–2047, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Crawford DC, Crosslin DR, Tromp G, Kullo IJ, Kuivaniemi H, Hayes MG, et al. : eMERGEing progress in genomics-the first seven years. Front Genet 5: 184, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Consortium e: Lessons learned from the eMERGE Network: balancing genomics in discovery and practice. Human Genetics and Genomics Advances. 2: 100018, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. : The UK Biobank resource with deep phenotyping and genomic data. Nature 562: 203–209, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, et al. : Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70: 214–223, 2016 [DOI] [PubMed] [Google Scholar]
8.All of Us Research Program I: Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, Jenkins G, Dishman E. The “All of Us” Research Program. N Engl J Med 381: 668–676, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.de Boer IH, Alpers CE, Azeloglu EU, Balis UGJ, Barasch JM, Barisoni L, et al. ; Kidney Precision Medicine Project: Rationale and design of the Kidney Precision Medicine Project. Kidney Int 99: 498–510, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B1] 1.Bush WS, Moore JH: Chapter 11: Genome-wide association studies. PLOS Comput Biol 8: e1002822, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Bush WS, Oetjens MT, Crawford DC: Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet 17: 129–145, 2016 [DOI] [PubMed] [Google Scholar]

[B3] 3.Khan A, Shang N, Petukhova L, Zhang J, Shen Y, Hebbring SJ, et al. : Medical records-based genetic studies of the complement system. J Am Soc Nephrol 32: 2031–2047, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Crawford DC, Crosslin DR, Tromp G, Kullo IJ, Kuivaniemi H, Hayes MG, et al. : eMERGEing progress in genomics-the first seven years. Front Genet 5: 184, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5] 5.Consortium e: Lessons learned from the eMERGE Network: balancing genomics in discovery and practice. Human Genetics and Genomics Advances. 2: 100018, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. : The UK Biobank resource with deep phenotyping and genomic data. Nature 562: 203–209, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, et al. : Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70: 214–223, 2016 [DOI] [PubMed] [Google Scholar]

[B8] 8.All of Us Research Program I: Denny JC, Rutter JL, Goldstein DB, Philippakis A, Smoller JW, Jenkins G, Dishman E. The “All of Us” Research Program. N Engl J Med 381: 668–676, 2019 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.de Boer IH, Alpers CE, Azeloglu EU, Balis UGJ, Barasch JM, Barisoni L, et al. ; Kidney Precision Medicine Project: Rationale and design of the Kidney Precision Medicine Project. Kidney Int 99: 498–510, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Biobanks Linked to Electronic Health Records Accelerate Genomic Discovery

Dana C Crawford

John R Sedor

Disclosures

Funding

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Biobanks Linked to Electronic Health Records Accelerate Genomic Discovery

Dana C Crawford

John R Sedor

Disclosures

Funding

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases