Table 3.
Cat. ID | Customization Task Category | Customization Task Example | Estimated time consumption (median) | Score | Count |
---|---|---|---|---|---|
Knowledge aspect | |||||
C1 | Map source vocabulary to target vocabulary (among different terminologies or same terminology different versions) | Vocabulary mapping to get non-ingredient RxCUI from ingredient RxCUI since medication may also be coded using brand drug codes Vocabulary mapping to get ICD10 procedure codes from provided ICD9 procedure codes considering the data contains ICD10 coded data |
<1day | K 0 | 185 |
C2 | Map free text to target vocabulary | Vocabulary mapping to get RxCUI from provided medication names | <1week | K 1 | 88 |
C3 | Define an operational definition of a specific EHR data element | Find “provider specialty” that links to procedure to check eye exam from ophthalmology department | <1week | K 1 | 138 |
C4 | Define an operational definition of a non-EHR data element event | Define “Continuous enrollment/contact” for implementation | <1day - <1week | K 2 | 20 |
C5 | Locate the data source for a group of data | Identify where to find “carotid imaging study” | <1day | K 1 | 89 |
C6 | Retrieve data attribute representation and contextual knowledge through exploring structured data | Find lab unit, categorical range for urine protein tests | <1hour | K 1 | 61 |
C7 | Retrieve data attribute representation and contextual knowledge through exploring unstructured data | Explore radiology reports to validate the local use of the “intravenous contrast” keywords provided and their occurrence prevalence | <1week | K 2 | 18 |
C8 | Acquire knowledge of unstructured clinical data from domain expert and through programming | Find relevant “note types” and “service groups” which clinical notes may contain PAD information | <1week and <1month | K 2 | 14 |
Interpretation aspect | |||||
C9 | Understanding phenotype algorithm pseudocode clause | Understand if “ever” from “Taking ARBs (angiotensin receptor blockers) ever” means both structured and unstructured medication lists should be used | <1day | I 0–2 | 46 |
Programming aspect | |||||
C10 | Compile machine readable input file | Compile ICD codes or code groups provided in a pdf pseudocode appendix to a machine-readable file | <1day | P 0 | 258 |
C11 | Pre-processing data by simple programming | Find if relevant pathology reports exist from clinical data warehouse programmatically | <1week | P 1 | 11 |
C12 | Search keywords from unstructured data | Find at least 2 unique DSM-IV social interaction terms from notes | <1week | P 1 | 56 |
C13 | Search keywords with modifier from unstructured data | Identify non-negated diverticulosis terms (e.g., diverticulitis, diverticula) from relevant radiology reports | <1week | P 2 | 16 |
C14 | Extract information from unstructured data using advanced NLP implementation | Extract heart rate from ECG report | <1month | P 2 | 16 |
C15 | Extract information from NLP tool processed documents | Search heart disease concepts (UMLS CUIs) from MedLEE parsed ECG report | <1day | P 1 | 11 |
C16 | Configure, install and execute NLP tools | Install cTAKES for parsing clinical notes | <1week | P 2 | 23 |
C17 | Populate NLP search results for SQL query | To exclude patient with cancer using ICD-9 and keywords, keywords search results from unstructured data need to be imported to the database | <1week | P 1 | 37 |
C18 | Implement complex SQL query | Implement extrapolating height at serum creatinine measurement time from its pre- and post- height measurement based on a formula | <1day and <1week | P 1 | 41 |
Other (not specific to the phenotype) | |||||
C19 | Check the availability and completeness of the needed data element | Potentially unavailable information of medication administration route, which is required for glaucoma phenotyping | <1week | 22 | |
C20 | Implement another existing phenotype | Use existing eMERGE T2DM algorithm to check if a patient has type 2 diabetes | <1week | 3 | |
Total (1153) |