Skip to main content
. 2021 Jun 15;9:642163. doi: 10.3389/fpubh.2021.642163

Table 1.

Procedures at each step for ensuring the adequacy of structured secondary health data for specific use in quantitative research.

Step Procedures
1. Problem understanding 1. Assessment of the characteristics of secondary input data
  • Content: what the data represents in the real world, source of data, the context in which it was collected
  • Estimated volume: number of records and size of expected files
  • Expected data file format
2. Assessment of the characteristics of the research
  • Population and period under study
  ∙Inclusion and exclusion criteria for selection
  • Study design and analysis unit
  • Variables involved in the main research questions, objectives, and hypotheses
3. Assessment of the characteristics of the output data
  • Estimated output data volume: number of records or file size
  • The desired format for delivery of output data
4. Checking the availability of input data and variable dictionaries
5. Evaluation of the ethical aspects and technical feasibility of data adequacy for the research
2. Resource planning 6. Sizing up human resources
7. Sizing up computational resources (hardware and software platform)
  • Volume and format of input data
  • Support for the operations required to adjust the input data
  • Estimated volume and format of output data
  • Performance and data volume limits for eligible computing resources
3. Data understanding 8. Obtaining secondary data files and variable dictionaries
9. Understanding the variable dictionaries related to the input data and creating the research variables dictionary for each type of file
10. Inventory of data files: name and extension, size in bytes, and number of records
11. Assessment of the existence of a unique record identifier (primary key) in each data file
12. Inventory of the variables contained in the data files: name, type, and size
13. Exploratory data analysis for completeness
14. Elaboration of the data extraction plan for the research
4. Data preparation 15. Execution of the data extraction plan
16. Exploratory data analysis to detect invalid content and assess the homogeneity in a data filling
17. Data cleaning and transformation to generate research variables
18. Updating the search variable dictionary
5. Data validation 19. Exploratory analysis of the transformed data for comparison with the original data
6. Data distribution 20. Exporting the database to the specified format (s)
21. Reduction of the database to contain only the research variables
22. Distribution of the database and dictionary of research variables