Skip to main content
Oxford University Press logoLink to Oxford University Press
. 2023 Nov 4;190(4):566–568. doi: 10.1093/bjd/ljad439

Digital biobanks are underutilized in dermatology and create opportunities to reduce the burden of skin disease

Ghislaine Jumonville 1, David Hong 2, Atlas Khan 3, Andrew DeWan 4, Suzanne M Leal 5,6, Chunhua Weng 7, Lynn Petukhova 8,9,
PMCID: PMC10941321  PMID: 37936310

Abstract

Digital biobanks that integrate genetic data with health data captured by digital sources are used routinely to discover genes, biomarkers, gene–environment interactions and pharmacogenetic relationships across many clinical areas. There remain many opportunities in dermatology to further use biobank data to increase our knowledge about the genetic architecture of skin disease, to resolve disease mechanisms that can be modulated by medical interventions and to discover genetically derived disease relationships that inform on drug repurposing and adverse events. Such knowledge promises to reduce the global burden of skin disease and facilitates the development of tailored medical care.


At the turn of this century deCODE genetics (https://www.decode.com/) proposed an ambitious new paradigm in human genetic research that entailed genotyping an entire population that had existing digitized health data. In 2007, the Wellcome Trust Case Control Consortium provided empirical evidence that such a dataset could be successfully repurposed to study a multitude of health outcomes.1 Today, digital biobanks that integrate genetic data with health data captured by digital sources [e.g. electronic health records (EHRs), patient surveys and multimodal imaging] have identified thousands of genetic associations for hundreds of health outcomes.

Digital biobanks are routinely used to discover genes, biomarkers, gene–environment interactions and pharmacogenetic relationships. They are generating knowledge that allows us to prioritize drug targets and improve disease classifications, facilitating the development of tailored medical care. The success of these resources is evidenced by the rapid proliferation of digital biobanks across the globe, which cumulatively capture data for millions of research participants.2

These resources have been used extensively to study cardiopulmonary and metabolic outcomes, but much less so for skin disease research.2 Thus, there remain many opportunities in dermatology to further use biobank data to increase our knowledge about the genetic architecture of skin disease, to resolve disease mechanisms that can be modulated by medical interventions and to discover genetically derived disease relationships that inform on drug repurposing and adverse events (Figure 1).

Figure 1.

Figure 1

Biobank utility for gene discovery and clinical translation. Digital biobanks are large data repositories that link EHRs, patient surveys and genetic data. The construction of biobanks was incentivized by precision medicine discoveries of clinically relevant disease subtypes. Recognizing that subtype analyses require large cohorts, governments began investing in the construction of digital biobanks. Cohorts can be efficiently constructed for genetic studies of common or rare variants from phenotype data captured by EHRs or patient surveys; or for PheWAS from genetic data. Created with images from BioRender.com.

EHRs, electronic health records; GWAS, genome-wide association studies; PheWAS, phenome-wide association studies.

Biobanks have facilitated the discovery of thousands of common risk variants with genome-wide association studies (GWAS), generating knowledge that is improving patient care. GWAS test millions of genetic variants for association with one health outcome, and are an efficient tool for identifying clinically relevant pathways and cell types.3 They also led to the important discovery that many variants implicated by GWAS are pleiotropic, influencing risk for multiple diseases. Pleiotropy creates new opportunities for drug repurposing and offers insight into off-label treatment effects.4 Finally, the development of polygenic risk scores (PRS) from GWAS helps to identify individuals with high risk for disease and to resolve aetiological heterogeneity.

In addition to facilitating the discovery of common disease risk variants with GWAS, biobanks are improving our knowledge about a large network of clinically relevant pathways that link different diseases by conducting phenome-wide association studies (PheWAS). PheWAS flip the study design used for GWAS, testing thousands of phenotypes for association with one genetic outcome (either a variant or a PRS). Thus, PheWAS require cohorts that have been extensively annotated with multiple disease outcomes, as occurs naturally in EHRs, which accumulate longitudinal data from clinical encounters. PheWAS provide rigorous statistical tests of pleiotropy, strengthening evidence for drug repurposing and safety profiles.5,6

A more recent innovation in biobanks has been the integration of sequencing data to facilitate investigations of rare genetic variants which cannot be studied with GWAS. Rather, burden tests aggregate rare variants by the genes into which they fall. These gene-level association tests identify individual genes with an excess ‘burden’ of rare variants among cases relative to controls. In contrast, each common risk variant identified with GWAS typically implicates multiple genes located over relatively large regions of the genome due to linkage disequilibrium (LD). LD creates challenges in distinguishing the genes that contribute to risk from the genes that happen to be in the same location. Thus, these two genetic methods provide different information that, when integrated together, improve our ability to translate genetic evidence into new biologic and clinical knowledge.

There are several limitations that may hinder the use of these resources to study skin disease, such as the sensitivity and specificity of algorithms that identify cases and controls, which typically utilize International Classification of Diseases (ICD) diagnosis codes,7 and healthcare utilization patterns, which may influence who are represented in these studies and the quality and completeness of their data.8 Further methodological development is needed to overcome these challenges. A lack of ancestral diversity in digital biobanks also limits their utility.2

Nonetheless, there remain many opportunities for further leveraging of digital biobanks to reduce the burden of skin disease. Some prevalent skin diseases have not been studied extensively with GWAS, such as rosacea and lichen planus. Others, such as hidradenitis suppurativa and alopecia areata lack sufficient numbers of risk loci for the development of robust PRS. Psoriasis and vitiligo studies have not yet used biobanks to further increase cohort sizes. Importantly, both PheWAS and exome studies are lacking for many skin diseases. Integrating GWAS and PheWAS can better prioritize drug repurposing studies. Integrating knowledge from burden tests and PRS developed from GWAS can improve our ability to stratify patients into clinically relevant disease subtypes. Importantly, there are salient and unrealized opportunities for integrating skin imaging data into digital biobanks.9 Conducting such studies in digital biobanks are cost-effective and time-efficient and promise to improve the quality of care delivered to patients with skin disease.

Acknowledgements

we thank Dr Jeffrey Cohen for providing thoughtful feedback on an earlier version of this manuscript.

Contributor Information

Ghislaine Jumonville, Department of Epidemiology (Mailman School of Public Health).

David Hong, Columbia College.

Atlas Khan, Departments of Medicine.

Andrew DeWan, Department of Chronic Disease Epidemiology, Center for Perinatal, Pediatric and Environmental Epidemiology, Yale University, New Haven, CT,USA.

Suzanne M Leal, Center for Statistical Genetics (Gertrude H. Sergievsky Center, Taub Institute for Alzheimer’s Disease and the Aging Brain); Department of Neurology, Columbia University, NY, USA.

Chunhua Weng, Biomedical Informatics.

Lynn Petukhova, Department of Epidemiology (Mailman School of Public Health); Dermatology (all in the Vagelos College of Physicians & Surgeons).

Funding sources

L.P. receives funding and support from NCATS (KL2TR001874 and UL1TR001873), NIAMS (K01AR075111 and R01AR080796) and Columbia University’s Precision Medicine Resource, Precision Medicine Initiative, the Herbert Irving Comprehensive Cancer Center and the Data Science Institute. C.W. receives funding from R01LM012895 and U01HG008680.

Conflicts of interest

none to declare.

Data availability

the data underlying this article will be shared on reasonable request to the corresponding author.

Ethics statement

not applicable.

References

  • 1. Wellcome Trust Case Control Consortium . Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007; 447:661–78. 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Zhou W, Kanai M, Wu K-HH. et al. Global biobank meta-analysis initiative: powering genetic discovery across human disease. Cell Genom 2022; 2:100192. 10.1016/j.xgen.2022.100192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Abdellaoui A, Yengo L, Verweij KJH, Visscher PM. 15 years of GWAS discovery: realizing the promise. Am J Hum Genet 2023; 110:179–94. 10.1016/j.ajhg.2022.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Plenge RM, Scolnick EM, Altshuler D. Validating therapeutic targets through human genetics. Nat Rev Drug Discov 2013; 12:581–94. [DOI] [PubMed] [Google Scholar]
  • 5. Rastegar-Mojarad M, Ye Z, Kolesar JM. et al. Opportunities for drug repositioning from phenome-wide association studies. Nat Biotechnol 2015; 33:342–5. [DOI] [PubMed] [Google Scholar]
  • 6. Carss KJ, Deaton AM, Del Rio-Espinola A. et al. Using human genetics to improve safety assessment of therapeutics. Nat Rev Drug Discov 2023; 22:145–62. [DOI] [PubMed] [Google Scholar]
  • 7. Li L, Chase HS, Patel CO. et al. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. Proceedings of the American Medical Informatics Association Annual Symposium 2008; 2008:404. [PMC free article] [PubMed] [Google Scholar]
  • 8. Ta CN, Weng C. Detecting systemic data quality issues in electronic health records. Studies Health Technol Info 2019; 264:383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Tkaczyk E. Innovations and developments in dermatologic non-invasive optical imaging and potential clinical applications. Acta Derm Venereol 2017; 97(Suppl. 218):5–13. 10.2340/00015555-2717. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

the data underlying this article will be shared on reasonable request to the corresponding author.


Articles from The British Journal of Dermatology are provided here courtesy of Oxford University Press

RESOURCES