Table 1.
Review of current approaches for each data pipeline stage.
Current solutions | Domain expert role | |||
Data curation | ||||
|
Data integration | |||
|
|
Domain experts are needed to validate results of integration, and interactively correct automated methods, which can then update their algorithm | ||
|
Data discovery | |||
|
|
Domain expert feedback is needed to finalize the analysis data set | ||
Data cleaning | ||||
|
Error fixes | |||
|
|
Domain expert input can be used to identify and fix errors | ||
|
Augmentation | |||
|
|
Domain experts can augment missing data with domain-specific rules | ||
|
Transformation | |||
|
|
Domain experts can restructure the data to make it semantically valid | ||
Data analysis | ||||
|
Exploration | |||
|
|
Domain experts interact with summaries and outliers to draw insight | ||
|
Explainable | |||
|
|
Domain experts inform the model design to ensure explainability |