Skip to main content
. 2013 Mar 26;20(e1):e147–e154. doi: 10.1136/amiajnl-2012-000896

Table 2.

Electronic Medical Records and Genomics: validated phenotypes, participating sites, and validation approach by site

Phenotype EMR categories to define phenotype Challenges
Cataract ICD-9 codes, eye exam, problem list, text, and scanned documents Not all sites had adequate detail in EMR. Optical character recognition required for scanned records was not available at all sites
Dementia ICD-9 codes, medications Primary site had research-quality Alzheimer's diagnosis while others did not, compromising dementia as phenotype. Some sites had pharmacy database, others relied on NLP for pharmacy
Type 2 diabetes ICD-9 codes, medications, laboratory tests Difficulty handling repeated measures, differentiating type 1 from type 2 diabetes, abstracting medications from orders versus pharmacy versus NLP
Diabetic retinopathy ICD-9 codes, laboratory tests, eye exam, problem list, text Detailed data from eye exams not available at all sites
Resistant hypertension* Systolic and diastolic blood pressure, medications, ICD-9 codes, free text, laboratory tests, ejection fraction Difficulty with timing around blood pressure measures and handling repeated measures
Peripheral arterial disease ICD-9 and CPT-4 codes, text, vascular lab criteria (ankle brachial index) Ankle brachial index not in retrievable format in all EMRs
Primary hypothyroidism ICD-9 and CPT-4 codes, medications, laboratory tests, text Large number of exclusions posed challenges in developing chart review form. Person-level (lifetime) exclusion criteria were complicated by transience and time-frame limitations of the EMR (older records on paper)
Low levels of high-density lipoprotein cholesterol and baseline lipid values Laboratory tests, medications, ICD-9 codes Difficulty in handling repeated measures
Red blood cell indices Laboratory tests, ICD-9 and CPT-4 codes, medications Difficulty in handling repeated measures. Phenotype had a large number of exclusions
White blood cell indices Laboratory tests and location of draw (eg, hospital vs clinic), ICD-9, CPT-4, and HCPCS codes, medications Difficulty in handling repeated measures
Normal cardiac conduction (PR and QRS intervals) Electronic ECG data, medications, NLP, ICD-9 and CPT codes, laboratory tests Locating and mining electronic ECG data from vendor systems was difficult. Challenge asserting absence of heart disease (eg, excluding family history) or electrolyte abnormalities at the time of the ECG
Height Height measurements, ICD-9 codes, medications, laboratory tests Difficulty determining the normal range and handling repeated measures

All completed algorithms are available for download from http://PheKB.org.

The challenges discussed here are new observations that complement those in an earlier publication.17

*Genome-wide analysis not yet completed.

EMR, electronic medical record; HCPCS, health care common procedure system; NLP, natural language processing.