Skip to main content
. Author manuscript; available in PMC: 2019 Oct 1.
Published in final edited form as: Am J Med Genet B Neuropsychiatr Genet. 2017 May 30;177(7):601–612. doi: 10.1002/ajmg.b.32548

Figure 1:

Figure 1:

Workflows for leveraging phenotypic data from HER. A) Extraction of clinical data into a research-ready database. Unstructured text can be transformed into standardized coded format through natural language processing (NLP); B) Stages in development of a phenotyping algorithm for case–control analyses. (1) An enriched datamart of cases or controls for the target phenotype is constructed using structured data filters followed by (2) selection of a subset for clinician chart review to establish gold-standard instances. (3) Potential predictors of case (or control) status are extracted from structured and text features in a subset of charts. (4) Using these selected features, a model is trained to predict the gold-standard cases/controls and model metrics are calculated to desired performance. (5) The model is applied to the full datamart and a chart review of a subset of cases (or controls) is conducted to determine PPV and NPV. (6) If desired performance is not achieved, the model can be adjusted until adequate performance (e.g. PPV > .90) is obtained.