| Study stage | Recommendations for increasing FAIRness and harmonizability of shared data |
|---|---|
| Design |
Choose a data format with common data model (CDM) compatibility. While designing the experiment, a data format that adheres to formatting standards would facilitate to share the data under a common data model. Software transforming data from one format to another through a common data model can reduce the cost of harmonization by automating the data extraction, transform, load (ETL) process required for integrating different datasets. Choose a standard vocabulary and measures. Collecting data under semantic standards allow for better reusability of the data, facilitating the process of variable alignment and increasing ‘inferential equivalence’ between studies to harmonize. Considering vocabulary standards in the field of study during the design phase would improve the value of the shared data and increase the pool of studies that the research community can reuse. Choose standard procedures and protocols. The same metrics can be collected with variation even under the same standard vocabulary if the procedures, protocols, and tools for data collection are different. Determining standard operating procedures (SOP) common in a field of study can reduce deviations introduced during data collection. |
| Data Collection and Curation | Minimize variations in protocols, data entry, and data management tools. Document any changes made in data values, data format, vocabularies, and protocols. This information facilitates identification of potential inconsistencies during harmonization. Using data version control systems can reduce the documentation effort. |
| Data Sharing | Release documentation together with the data. Data dictionaries or codebooks are necessary for documenting the meaning of the data to promote reusability and downstream harmonization efforts. The use of standard vocabularies can facilitate data dictionaries. If the study data is formatted with compatibility to a CDM, providing the documentation, and scripts if available, would reduce the barrier for interoperability. Documentation should also include protocols, SOPs, data collection tools, as well as potential deviations from these. |