Table 1.
Data Science steps annotated in DASWOW
Label | Definition |
---|---|
helper_functions | Code that is not directly related to the data science activity at hand, but provides useful scripting functions (e. g. importing or configuring libraries). |
load_data | The process of loading a dataset of any type (e.g., .csv, .pkl) into a Jupyter notebook environment. |
data_preprocessing | The process of preparing the dataset(s) for the subsequent analysis. It includes tasks such as cleaning, instance selection, normalisation, data transformation, and feature selection. |
data_exploration | The process of inspecting the content and shape of a dataset to understand the nature and characteristics of the data. Note that it may involve the usage of visualisation techniques but differs in its purpose. |
modelling | The process of applying statistical models and learning-based algorithms to learn from sample data. |
evaluation | The process of assessing a model using one/various evaluation metric(s) such as goodness of fit and accuracy. |
prediction | The process of applying a model trained on a set of data to other or newly arriving pieces of data to forecast new values. |
result_visualization | The process of obtaining a graphical representation (e.g., tables, plots, graphs) of a/several measurement(s) |
save_results | The process of serialising and storing the data. |
comment_only | Lines of comment including commented code. |