Skip to main content
. 2019 Aug 2;7(1):38. doi: 10.5334/egems.298

Table 2.

Level 1 DQ framework characteristics and validation requirements.

1. Data Warehouse Context Location Framework

Requirement Description Explanation/Validation Requirements

1.1. Data Warehouse (DW) Location This is the location of the data warehouse where data are being hosted
1.2. Number of Source Systems within DW This is the number of applications that are being extracted from to output data housed within this DW
1.3. Data Processed Type This is whether the data that are being assessed are from the raw extraction still in the separate source system tables or whether data have been processed and combined into associated tables and fields between the source systems
1.4. Data Extraction Storage Type This is how the data handling of the extraction is done, either extracting all data each time and overwriting what is stored in the DW, or a complete extract when first requested and then extracting updated records only from each of the required tables in subsequent data extractions
2. Source Database and Table Name Context Location Framework

Requirement Description Validation Requirements

2.1. Source System Name This is the name of the source database software that is being assessed
2.2. Source System Extraction Type This is how the data were extracted from the source system, i.e. extracted from a CSV file created by the source provider or extracted from a database through the tables or views
2.3. CSV/Text and Other Source File Additional Information Define where or by whom the CSV/text file was created or what the other source of the data is, e.g. from a pathology laboratory CSV file created by the lab’s IT department based upon defined requirements
2.4. Table/File Name This is the name of the database table or CSV/text file that the data has come from as the source that is being assessed
2.5. Location of the Table Context/Meaning This is either the tables context/meaning written or the location of the file that contains this information
2.6. Location of Table Fields/Variables List This is either the tables’ fields listed or the location of the file that contains this information
3. Table Name Data Quality Framework: Conformance: Do Data Values Adhere To Specified Standards And Formats?

DQ Characteristic Validation Requirements

3.1. Relational Conformance
3.1.1. Data values conform to relational constraints The table within the data warehouse should be structured so that it contains easily identifiable fields/columns that can be used as a foreign key, so data are easily linkable in a usable and meaningful way
3.1.2. Unique (key) data values are not duplicated The table must have a unique record ID that is not repeated without easily identifiable reasons, i.e. record has been updated or deleted; a new record has been created for the same record ID; the old record is expired and the status of the new row is set to updated or deleted
3.1.3. Table from Source System has a Created Date, Created By, Updated Date, Updated By and Record Status fields The table contains gold-standard fields that enable auditors and users of the secondary data to know if the data contained within the record have been updated and by whom; to know if the record was active, inactive or deleted at the time of data extraction
3.1.4. The Source System Gold-Standard Field Names and Associated Status Variable Codes and Meanings: Created Date, Created By, Updated Date, Updated By and Record Status fields This lists the system field names held within the table for the Created Date, Created By, Updated Date, Updated By and Record Status fields, including the variable code and associated meaning of the record status field or the location of where this information is held
3.1.5. Table from Data Warehouse Required Additional Fields have at least one of the following Field Types: Imported Date, Exported Date and Data Warehouse Import Status Fields The data warehouse as a gold standard should have in each table and each record when it was exported as a date/time stamp from the source system; imported as a date/time stamp into the data warehouse and the status of each record – to ensure the latest data, or data required at a set date and time, are used as required
3.1.6. The Data Warehouse Gold-Standard Field Names and Associated Status Variable Codes and Meanings: Imported Date, Exported Date and Data Warehouse Import Status Fields This lists the system field names held within the table for the Imported Date, Exported Date and Data Warehouse Import Status fields, including the variable code and associated meaning of the Data Warehouse Import Status field or the location of where this information is held
If failed why
4. Table Name Data Quality framework: Plausibility: Are Data Values Believable?

4.1. Uniqueness Plausibility
4.1.1. Data values that identify a single object are not unnecessarily duplicated. Data held within the table are not duplicated values, with the exception of updated records and deleted records for a specific record and patient held within a table – i.e. patient’s postcode has changed from 3001 to 3124: a new record with the same record ID but an updated DW import status code exists within the table
If failed why
4.2. Temporal Plausibility
4.2.1. Observed or derived values conform to expected temporal properties. Data held within the table are stored within correct timeframes and events expected – i.e. a patient’s appointment start date and time is before the end date and time of the same appointment
4.2.2. Sequences of values that represent state transitions conform to expected properties. Data held within the table that display events that are required to have multiple entries, have them in the expected sequence and associated values based upon external and internal standards or regulations – i.e. date of an initial immunization precedes date of a booster immunization
4.2.3. Measures of data value density against a time oriented denominator are expected based on internal knowledge. Data held within the table show expected fluctuations for time-orientated events based upon local and external knowledge – i.e. increase in influenza immunizations during flu season
If failed why
5. Field Name Context Location Framework

Requirement Description Valuation Requirements

5.1. Field Name This is the name of the field within the source database software’s database table that is being assessed
5.2. Location of the Field Context/Meaning This is either the field’s context/meaning as written or the location of the file that contains this information
5.3. Field Variable Type and Length This is the type of field and the length of the field – i.e. char 60
5.4. Field Key Type This indicates if the field is a primary key, composite primary key or a foreign key. This can be skipped if it is not identified as a key field
5.5. Field Input Type i.e. look up, text, date, integer/numeric This is the allowable data input that the field will accept
5.6. Field Allowable Characters – if other than a look up field This is the ASCI characters that the field will allow to be entered – i.e. a phone number field will only allow numeric values with no spaces. This can be skipped if it is a look up field
5.7. Field Available Variables – if a look up If the field is a look up table, this will list either the table location and joining field of the look up values, if there are greater than ten options, or it will list the variable value and corresponding description. This can be skipped if it is not a look up field
6. Field Name Data Quality Framework: Conformance: Do Data Values Adhere to Specified Standards and Formats?

DQ Characteristic Validation Requirements

6.1. Value Conformance
6.1.1. Data values conform to internal formatting constraints. Data contained within the field need to conform to the required expected field type requirements for the system and external standards where the system is being used – i.e. postcode for Australia needs an integer value, of four integers
6.1.2. Data values conform to allowable values or ranges. The data held within the field must contain only the expected values or ranges that the field allows, based upon what the system has been designed to use – i.e. sex can only allow one numeric value that is translatable or one alpha value that is translatable
If failed why
6.2. Computational Conformance
6.2.1. Computed values conform to computational or programming specifications. Data held within the field conform to known calculation requirements and can be validated with manual calculation of formulas – i.e. the body mass index calculated within the system yields the same results as a manual calculation with the same values
If failed why
7. Field Name Data Quality Framework: Completeness: Are Data Values Present?

7.1. The absence of data values at a single moment in time agrees with local or common expectations. Data held within the field are not missing or null/blank based upon expected local and external standard requirements – i.e. sex is expected always to have a value present; work contact phone number can be null/blank as not everyone has one
7.2. The absence of data values measured over time agrees with local or common expectations. Data held within the field are null/blank until the action required to generate a value has occurred, within the expected time frames of the local and external standard requirements – i.e. medical discharge time is missing for three consecutive days
If failed why
7.3. Atemporal Plausibility
7.3.1. Data values and distributions agree with an internal measurement or local knowledge. The data stored within the field are stored and displayed with expected values that local and external standards would suggest are acceptable – i.e. height and weight values are positive and above 0
7.3.2. Data values and distributions for independent measurements of the same fact are in agreement. The data stored within the field are in agreement with external standards and knowledge – i.e. the weight of an adult cannot be below 10
7.3.3. Logical constraints between values agree with local or common knowledge (includes “expected” missingness). The data stored within the field display expected results based upon local and external knowledge and known facts and common sense – i.e. a patient identified as male does not have a pregnancy documented
7.3.4. Values of repeated measurement of the same fact show expected variability. The data stored within the field compared to data of a similar or same requirement display acceptable variability between the data – i.e. sitting blood pressure taken is within similar ranges such as 160/85 at Time 1 and 145/80 (rather than 85/160) at Time 2
If failed why
8. Field Name Data Quality Framework Overall Results

8.1. Overall Pass/Fail of the data This determines if the data held within the field are based upon the assessment of the above characteristics – i.e. if the data have passed with good data quality or failed with bad data quality
8.2. Accuracy of the data held within the field (%) The percentage of data held within the field that is accurate based upon local knowledge and standards – i.e. % of patients who have a sex associated to them and with the correct values based upon the context of the system
8.3. Completeness of the data held within the field (%) The percentage of data held within the field, which has a value held within the field based upon local knowledge and standards – i.e. % of patients who have a sex associated to them.
8.4. Data limitations of the data within the field in the data warehouse Document the limitations of the data held within the field based upon the context of the system the data were obtained from
8.5. Data interpretation issues of the data within the field in the data warehouse Document how the data that are held within the field and table can be misinterpreted – i.e. the doctor associated to a patient from an imported patient record does not have that doctor name within the user table of the application the data were exported from
8.6. Data issues of the data within the field in the data warehouse Document any issues the data can have from local and internal knowledge of the applications – i.e. Medical Director™ allows a user to code a fever as a procedure
Other comments/feedback Document any other relevant information.
9. Table Data Quality Framework Overall Results

9.1. Overall Pass/Fail of the data This determines if the data held within the table are based upon the assessment of the above characteristics – i.e. if the data have passed with good data quality or failed with bad data quality
9.2. Accuracy of the data held within the table (%) The percentage of data held within the table that is accurate based upon local knowledge and standards – i.e. % of patients who have a sex associated to them and with the correct values based upon the context of the system
9.3. Completeness of the data held within the table (%) The percentage of data held within the table, that has a value held within the table, that is based upon local knowledge and standards – i.e. % of patients who have a sex associated to them
9.4. Data limitations of the data within the table in the data warehouse Document the limitations of the data held within the table based upon the context of the system the data was obtained from
9.5. Data interpretation issues of the data within the table in the data warehouse Document how the data that are held within the table can be misinterpreted – i.e. the doctor associated to a patient from an imported patient record does not have that doctor name within the user table of the application the data were exported from
9.6. Data issues of the data within the table in the data warehouse Document any issues the data can have from local and internal knowledge of the applications – i.e. Medical DirectorTM allows a user to code a fever as a procedure
Other comments/feedback Document any other relevant information