Table 1.
i2b2 method for defining disease phenotype algorithms.
Steps | Task | Team Member |
---|---|---|
1 | Randomly select 400 subjects with ICD-9 code | Programmer |
2 | Review charts, confirm diagnosis for Training Set | Domain expert |
3 | Create custom list of concepts relevant to disease | Domain expert |
4 | Extract EMR data to create codified variables | Programmer |
5 | Create custom list of NLP variables | Domain expert |
6 | Map variables UMLS concept unique identifier (CUI) | Informatician |
7 | Extract CUIs from narrative text in EMR using NLP | Informatician |
8 | Run LASSO regression with codified + NLP variables predicting disease status in Training Set | Statistician |
9 | Set specificity at 97%, select predicted probability among Training Set to achieve >90% PPV | Statistician |
10 | Apply algorithm to remaining Biobank subjects (excludingTraining Set) | Statistician |
11 | Randomly select 100 subjects for Validation Set | Programmer |
12 | Perform chart review in Test Set, define PPV | Domain expert |