1 |
Randomly select 400 subjects with ICD-9 code |
Programmer |
2 |
Review charts, confirm diagnosis for Training Set |
Domain expert |
3 |
Create custom list of concepts relevant to disease |
Domain expert |
4 |
Extract EMR data to create codified variables |
Programmer |
5 |
Create custom list of NLP variables |
Domain expert |
6 |
Map variables UMLS concept unique identifier (CUI) |
Informatician |
7 |
Extract CUIs from narrative text in EMR using NLP |
Informatician |
8 |
Run LASSO regression with codified + NLP variables predicting disease status in Training Set |
Statistician |
9 |
Set specificity at 97%, select predicted probability among Training Set to achieve >90% PPV |
Statistician |
10 |
Apply algorithm to remaining Biobank subjects (excludingTraining Set) |
Statistician |
11 |
Randomly select 100 subjects for Validation Set |
Programmer |
12 |
Perform chart review in Test Set, define PPV |
Domain expert |