Skip to main content
. 2020 Nov 5;8(11):e19612. doi: 10.2196/19612

Table 1.

Review of current approaches for each data pipeline stage.

Current solutions Domain expert role
Data curation

Data integration


  • Schema matching [138-143]

  • Interactive integration [144,145]

  • Webtables integration [146-151]

  • Machine learning [46-49]

Domain experts are needed to validate results of integration, and interactively correct automated methods, which can then update their algorithm

Data discovery


Domain expert feedback is needed to finalize the analysis data set
Data cleaning

Error fixes


Domain expert input can be used to identify and fix errors

Augmentation


Domain experts can augment missing data with domain-specific rules

Transformation


Domain experts can restructure the data to make it semantically valid
Data analysis

Exploration


Domain experts interact with summaries and outliers to draw insight

Explainable


Domain experts inform the model design to ensure explainability