Skip to main content
. 2022 Jan 28;22:23. doi: 10.1186/s12911-022-01759-z

Table 1.

Counts of vagueness and under-specification in narrative phenotype algorithms

Code Category Sub-category Description Total instances Phenotype count (%)
1.1 Definition of variable Attributes of variable Under-specification in attributes (min, max, etc.) of a variable 47 13 (68.4%)
1.1.1.a Time point Temporal entity Under-specification of the time anchor or point of reference for a certain criterion 22 11 (57.9%)
1.1.1.b Time point Temporal interval Under-specification of the range of time you are looking at to find a certain criteria (diagnosis, medication, lab, etc.) 6 5 (26.3%)
1.1.2.a Threshold Missing threshold Vagueness or under-specification for a criterion in the phenotype algorithm 2 2 (10.5%)
1.1.2.b Threshold Quantifying qualitative terms Vagueness or under-specification in the qualitative term describing a criterion (e.g., chronic, young, old, severe, negative, positive) and lacking quantitative values 1 1 (5.3%)
1.1.2.c Threshold Units The units associated with the numeric value (e.g., mg/dL) are not specified 2 1 (5.3%)
1.2 Definition of variable Alternatives to missing data Request for instructions when data elements not available 6 5 (26.3%)
1.3 Definition of variable Code/acronym/term definition

Under-specification regarding acronyms, variables or codes. This could be related to:

1. Local and unique codes

2. Coding/terminology system (including use of base codes)

3. Vague terminology/codes

28 11 (57.9%)
1.4 Definition of variable Location in EHR Under-specification regarding how or where certain criteria/variables should be obtained within the EHR 10 6 (31.6%)
2.1 Data dictionary Data delivery Under-specification regarding how the data dictionaries should be structured and how to be delivered to site 3 2 (10.5%)
2.2 Data dictionary Information inclusion Under-specification regarding what results should be included in the data dictionary 31 10 (52.6%)
2.3 Data dictionary Results presentation and formatting Under-specification regarding the formatting of the results in the data dictionary. This may include numeric formatting (e.g., number of decimal places), or granularity of units (e.g., date of birth vs. age) 27 8 (42.1%)
3.1 Logic Discordant logic Discrepancy between the written description and the flow chart or the procedures in the flowchart 17 8 (42.1%)
3.2 Logic Missing rationale or context Under-specification in the rationale and/or context of the phenotype for its appropriate application 11 8 (42.1%)
3.3 Logic Population criteria Vagueness and under-specification in the criteria differences between the case and control or other cohort definitions 20 11 (57.9%)

A total of 304 instances were found across 253 comments (a single comment could exhibit more than one category). Sub-codes are more specific and considered distinct from a higher-level code. Total instances denote the aggregate count of unique instances of under-specifications found across all phenotypes