Table 2.
Electronic Medical Records and Genomics: validated phenotypes, participating sites, and validation approach by site
Phenotype | EMR categories to define phenotype | Challenges |
---|---|---|
Cataract | ICD-9 codes, eye exam, problem list, text, and scanned documents | Not all sites had adequate detail in EMR. Optical character recognition required for scanned records was not available at all sites |
Dementia | ICD-9 codes, medications | Primary site had research-quality Alzheimer's diagnosis while others did not, compromising dementia as phenotype. Some sites had pharmacy database, others relied on NLP for pharmacy |
Type 2 diabetes | ICD-9 codes, medications, laboratory tests | Difficulty handling repeated measures, differentiating type 1 from type 2 diabetes, abstracting medications from orders versus pharmacy versus NLP |
Diabetic retinopathy | ICD-9 codes, laboratory tests, eye exam, problem list, text | Detailed data from eye exams not available at all sites |
Resistant hypertension* | Systolic and diastolic blood pressure, medications, ICD-9 codes, free text, laboratory tests, ejection fraction | Difficulty with timing around blood pressure measures and handling repeated measures |
Peripheral arterial disease | ICD-9 and CPT-4 codes, text, vascular lab criteria (ankle brachial index) | Ankle brachial index not in retrievable format in all EMRs |
Primary hypothyroidism | ICD-9 and CPT-4 codes, medications, laboratory tests, text | Large number of exclusions posed challenges in developing chart review form. Person-level (lifetime) exclusion criteria were complicated by transience and time-frame limitations of the EMR (older records on paper) |
Low levels of high-density lipoprotein cholesterol and baseline lipid values | Laboratory tests, medications, ICD-9 codes | Difficulty in handling repeated measures |
Red blood cell indices | Laboratory tests, ICD-9 and CPT-4 codes, medications | Difficulty in handling repeated measures. Phenotype had a large number of exclusions |
White blood cell indices | Laboratory tests and location of draw (eg, hospital vs clinic), ICD-9, CPT-4, and HCPCS codes, medications | Difficulty in handling repeated measures |
Normal cardiac conduction (PR and QRS intervals) | Electronic ECG data, medications, NLP, ICD-9 and CPT codes, laboratory tests | Locating and mining electronic ECG data from vendor systems was difficult. Challenge asserting absence of heart disease (eg, excluding family history) or electrolyte abnormalities at the time of the ECG |
Height | Height measurements, ICD-9 codes, medications, laboratory tests | Difficulty determining the normal range and handling repeated measures |
All completed algorithms are available for download from http://PheKB.org.
The challenges discussed here are new observations that complement those in an earlier publication.17
*Genome-wide analysis not yet completed.
EMR, electronic medical record; HCPCS, health care common procedure system; NLP, natural language processing.