Scientific software engineering principles |
Create generic functions for common EHR data cleaning and preprocessing operations which can be shared with the community |
|
Produce functions for defining study exposures, covariates and clinical outcomes across datasets which can be maintained across research groups and reused across many research studies |
|
Create modules for logically grouping common EHR operations e.g. study population definitions or datasource manipulation to enable code maintainability |
|
Create tests for individual functions and modules to ensure the robustness and correctness of results |
|
Track changes in analytical code and phenotypt definitions using controlled clinical terminology terms by making use of a source code revision control system |
|
Use formal software engineering best-practices to document workflows and data manipulation operations |
Standardized analytical approaches |
Build and distribute libraries for common EHR data manipulation or statistical analysis and include sufficient detail (e.g. command line arguments) for all tools used |
|
Produce and annotate machine-readable EHR phenotyping algorithms that can be systematically curated and reused by the community |
|
Use Digital Object Identifiers (DOIs) for transforming research artifacts into shareable citable resources and cross-reference from research output |
|
Deposit research resources (e.g. algorithms, code) in open-access repositories or software scientific journals and cross-reference from research output |
|
Virtual machines can potentially be used to encapsulate the data, operating system, analytical software and algorithms used to generate a manuscript and where applicable can be made available for others to reproduce the analytical pipeline. |
Literate programming |
Encapsulate both logic and programming code using literate programming approaches and tools which ensure logic and underlying processing code coexist |