Table 2.
Methodology steps | Technology independent step and design choices Implementation |
---|---|
Step 0: Preparation | Perform the design study phase in a health-environmental workflow, defining the strategy to answer a research question using empirical data. The design study phase also requires a clear definition for the health events relevant to the study, and the permission to process the health event’s location and date to link it with environmental data. Another important element is the definition of potential queries to help explore the research question. GAM. Include the KG approach to link environmental datasets and health events through potential personal information (event’s location and time) as part of the data processing strategy and compliance. |
Step 1: Data collection | Gather the available environmental datasets relevant to the research question of the study. The datasets are expected to have spatial and temporal features, which are required for Step 3: Data linkage. GAM. Include dataset metadata with at least the information related to the dataset descriptors (e.g. licence, title, version, temporal and spatial information and structure of the dataset), data provenance (e.g. distribution and download url), data use, agents that downloaded the data (e.g. researcher, software and entity) and the definitions for the environmental variables, including the units and source of information; and geometry data for relevant study areas |
Step 2: Semantic uplift | Design and execute rules on how to make the environmental datasets gathered in Step 1: Data collection interoperable. W3C. Define the uplift mapping using the Relational database to RDF Mapping Language (R2RML) using the RDF Data Cube vocabulary (QB) for the data, and Geographic Query Language for RDF Data (GeoSPARQL), PROV Ontology (PROV-O), Data Catalogue Vocabulary (DCAT) and Open Digital Rights Language (ODRL) for metadata (Supplementary Fig. 1). W3C. Uplift environmental datasets to RDF graphs. W3C. Store the RDF files resulting from the execution of the mapping and the semantic (meta)data vocabularies in a triplestore with GeoSPARQL support |
Step 3: Data linkage | Define a query template that links the environmental datasets within an area relevant to an event location and selects a period of data before that event date. The query template has placeholders (or variables) for users’ input (Step 4: Data visualisation) and should be designed to be generic enough to adapt to different data sources. W3C. Link datasets and events using a SPARQL query template with GeoSPARQL (spatial) and xsd:dateTime (temporal) reasoning functionalities to establish new relationships adequate for each use case (Supplementary Figure 2). |
Step 4: Data interaction | Design an initial User Interface (UI) to allow non- technical users to (i) input the minimum event data required to link with environmental datasets, (ii) specify the user’s relevant data linkage variables for the query template defined in Step 3: Data linkage, and (iii) execute the data linkage query and export the linked data and metadata generated as a data table for analysis, a graph for publication and an interactive report for exploration. UER. Design a simple UI on top of the KG focused on the data linkage process that allows for the input of health events with minimum information for the spatiotemporal linkage, the selection of linkage options and the export of linked data and metadata as a data table (CSV), graph (RDF) and interactive report (HTML). |
Step 5: Usability evaluation | Evaluate the usability and potential usefulness of the UI solution defined in Step 4: Data interaction in achieving the user requirements. Conduct the evaluation in an iterative manner progressing from version to version until the user requirements are achieved. GAM. Combine summative and formative conceptualisations of usability as evidence for achieving the expert requirements and using standard usability metrics when possible. |
The vocabularies and languages refer to W3C recommendations and standards using Semantic Web technologies.