Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2019 Oct 24;105(5):974–986. doi: 10.1016/j.ajhg.2019.09.027

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2019 American Society of Human Genetics.

PMC Copyright notice

HARLEE Workflow

ES VCF files are first annotated with Variant Effect Predictor (VEP), where one transcript is flagged per variant per gene. Consequence, SIFT, PolyPhen, variant allele frequency from multiple sources, domain information, and other annotations are additionally ascertained by VEP. VEP output is loaded into a Hadoop architecture data lake. Finally, population-, variant- and gene-level annotations from a variety of sources are loaded, allowing for modular, on-demand annotation. After samples and annotations are separately loaded into HARLEE, a series of SQL-like queries generate distinct gene lists. Bioinformatic filtering parameters based on loaded annotations are tuned to optimize discovery density, which takes into account the volume of genes reported to OMIM as disease-associated over time normalized against the number of remaining genes without OMIM disease annotations.