Skip to main content
. 2020 Aug 27;36(2):165–178. doi: 10.1007/s10654-020-00677-6

Table 2.

ETL stage 2: Defining the exposed group in cohort design or cases in case–control design

Defining code entities

In electronic health records all diagnoses, symptoms, treatment, physical and laboratory measurements are coded into the system using some sort of clinical coding system, for example in Vision and EMIS systems, Read codes are used to record all diagnoses and symptoms. In HES records OCPC and ICD10 codes are used for the same purpose

In this section we introduce a code entity which encapsulates clinical code that represents diagnoses, symptoms, treatment, physical or laboratory measurements etc. and some properties that describe it’s use and characteristic. The properties of Code Entities will change slightly depending on which ETL stage it is being used

Code entity for exposure
# Variable Data type Example
1 Name Text Type2Diabetes
2 Criteria

Categorical with following levels

  • Inclusion criteria

  • Exclusion criteria

Inclusion criteria
3 Exposure type

Categorical with following levels

For inclusion criteria

  • Incident only

  • Incident or prevalent

  • First record after cohort entry

For exclusion criteria

  • Exclude if ever recorded

  • Exclude if recoded before index date

For inclusion

  • Incident only

For exclusion

  • Exclude if ever recorded

4 Definition Delimited text

Read code for Type 2 diabetes:

C10F.11, C10F.00

ICD10 for Type 2 diabetes:

E110

ETL stage 2 user inputs:
# Variable name Data type Example
1 Code entity Code entity of exposure

Name: Type2 Diabetes

Criteria: Inclusion criteria

Exposure type: Incident only

Definition: {C10F.11, C10F.00}

Name: Metformin

Criteria: Inclusion criteria

Exposure type: Incident only

Definition: {6.1.2.2}

2 Combination logic of the exposure Text: formatted as regular grammar suggested Type2Diabetes and Metformin
3 Parsing mode

Categorical with following levels

  • Strict

  • Loose

Strict
4 Exclude patients if these outcomes occur before index Code entity (list)

Ischemic heart disease

Stroke

ETL stage 2 transformation logic
Transformation logic

Repeat the following for each eligible patient record present from previous stage

  • For each code entity with inclusion criteria if the exposure type is

    ○ Incident only: find the earliest event of the code entity before patient end date and save details if found, if the event is before patient start date exclude patient and document reason for exclusion

    ○ Incident or prevalent: find the earliest event of the code entity before patient end date and save details if found

    ○ First record after cohort entry: find the earliest event of the code entity after the patient start date and before patient end date and save details if found

  • The parser in the system based on the combination logic supplied does the following each time it encounters a code entity of the patient

    ○ If all inclusion code entities are found in the patient go to next step, else if controls are required mark patient as ‘potential control’ else discard the patient and document reason

    ○ If the parsing mode is loose, latest event date among the code entities is set as patient’s index date

    ○ Else if the parsing mode is strict, If and only if the code entities have occurred in the same order as defined in the combination logic, set patient’s index date as date of latest entities’ event date else discard the patient and document reason for rejection

  • For each code entity with exclusion criteria if the exposure type is

    ○ Exclude if ever recorded: find the event described by the code entity, if the entity is found exclude patient and record documentation for rejection

    ○ Exclude if recorded before index date: find the event described by the code entity before patient’s index date, if the entity is found exclude patient and record documentation for rejection

  • Exclude patients if these outcomes occur before index

    ○ For each code entity in the list check if it occurs before index date, if the entity is found exclude patient and record documentation for rejection