Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 Apr 16;190(10):1977–1992. doi: 10.1093/aje/kwab115

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2021. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

PMC Copyright notice

Overview of the data harmonization process used by the Trans-Omics for Precision Medicine (TOPMed) Data Coordinating Center (DCC). A) Existing study data in diverse formats are curated by the database of Genotypes and Phenotypes (dbGaP), including accessioning and conversion to a consistent file format. B) Formatted data and associated metadata (e.g., variable descriptions) are stored in a TOPMed DCC relational database. C) The harmonized phenotype variable is defined, and metadata for multiple studies are searched to identify candidate phenotypic variables that potentially can be harmonized together to produce the desired harmonized variable (harmonization steps 1 and 2). D) Analytical tools that interact with the DCC database are used for quality control (QC) of study variables, implementation of harmonization algorithms, and documentation; harmonized results are added to the same DCC database as that shown in step B (harmonization steps 3–5). E) Files containing a multistudy, harmonized data set and associated documentation are produced. F) Data, metadata, and documentation are submitted to a National Institutes of Health (NIH) repository for controlled access by the scientific community, while documentation files in JavaScript Object Notation format containing software code and provenance tracking are submitted to a publicly available GitHub repository.