Skip to main content
. 2016 Jan 20;6(1):5. doi: 10.3390/jpm6010005

Table 1.

i2b2 method for defining disease phenotype algorithms.

Steps Task Team Member
1 Randomly select 400 subjects with ICD-9 code Programmer
2 Review charts, confirm diagnosis for Training Set Domain expert
3 Create custom list of concepts relevant to disease Domain expert
4 Extract EMR data to create codified variables Programmer
5 Create custom list of NLP variables Domain expert
6 Map variables UMLS concept unique identifier (CUI) Informatician
7 Extract CUIs from narrative text in EMR using NLP Informatician
8 Run LASSO regression with codified + NLP variables predicting disease status in Training Set Statistician
9 Set specificity at 97%, select predicted probability among Training Set to achieve >90% PPV Statistician
10 Apply algorithm to remaining Biobank subjects (excludingTraining Set) Statistician
11 Randomly select 100 subjects for Validation Set Programmer
12 Perform chart review in Test Set, define PPV Domain expert