Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2020 Sep 2;29(R1):R33–R41. doi: 10.1093/hmg/ddaa192

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

PMC Copyright notice

Approaches to detect biases in EHR. (A) Consider order and convergence of diagnoses across the lifespan. Here, we consider three individuals (dark blue, light blue and grey), with identical variables within the EHR (single-dashed lines represent identical diagnoses, etc.). From a retrospective, ‘snapshot’ perspective, all individuals appear identical; however, order and persistence of diagnoses clearly differ. (B) Leverage known biology. Here, we illustrate how polygenic risk scores might be applied to test accuracy of phenotype definitions. In this case, we consider controls (‘-’), schizophrenia cases (‘+’) and individuals with ‘TRS’. If these definitions are accurately assigned, we expect increased PRS in cases versus controls and in TRS versus others (blue). If however some bias affects phenotype assignment, we may identify a group of individuals with divergent PRS (grey). (C) Here, we illustrate three EHR (EHR₁, EHR₂ and EHR₃), with overlapping but not identical sets of biases B and phenotypes [x₁…x_N]. Genotypic and phenotypic associations {SNPs, ICDx}, which are present across multiple different EHR are more likely to represent true biological signal rather than representing biased inferences.