Skip to main content
. 2017 Sep 11;10:31. doi: 10.1186/s13040-017-0151-7

Table 2.

Methods and approaches that can enable the reproducibility of biomedical research findings using electronic health records

Method/approach Recommendations
Scientific software engineering principles Create generic functions for common EHR data cleaning and preprocessing operations which can be shared with the community
Produce functions for defining study exposures, covariates and clinical outcomes across datasets which can be maintained across research groups and reused across many research studies
Create modules for logically grouping common EHR operations e.g. study population definitions or datasource manipulation to enable code maintainability
Create tests for individual functions and modules to ensure the robustness and correctness of results
Track changes in analytical code and phenotypt definitions using controlled clinical terminology terms by making use of a source code revision control system
Use formal software engineering best-practices to document workflows and data manipulation operations
Standardized analytical approaches Build and distribute libraries for common EHR data manipulation or statistical analysis and include sufficient detail (e.g. command line arguments) for all tools used
Produce and annotate machine-readable EHR phenotyping algorithms that can be systematically curated and reused by the community
Use Digital Object Identifiers (DOIs) for transforming research artifacts into shareable citable resources and cross-reference from research output
Deposit research resources (e.g. algorithms, code) in open-access repositories or software scientific journals and cross-reference from research output
Virtual machines can potentially be used to encapsulate the data, operating system, analytical software and algorithms used to generate a manuscript and where applicable can be made available for others to reproduce the analytical pipeline.
Literate programming Encapsulate both logic and programming code using literate programming approaches and tools which ensure logic and underlying processing code coexist