. 2022 Nov 19;28(1):7. doi: 10.1007/s10664-022-10229-z

Table 1.

Data Science steps annotated in DASWOW

Label	Definition
helper_functions	Code that is not directly related to the data science activity at hand, but provides useful scripting functions (e. g. importing or configuring libraries).
load_data	The process of loading a dataset of any type (e.g., .csv, .pkl) into a Jupyter notebook environment.
data_preprocessing	The process of preparing the dataset(s) for the subsequent analysis. It includes tasks such as cleaning, instance selection, normalisation, data transformation, and feature selection.
data_exploration	The process of inspecting the content and shape of a dataset to understand the nature and characteristics of the data. Note that it may involve the usage of visualisation techniques but differs in its purpose.
modelling	The process of applying statistical models and learning-based algorithms to learn from sample data.
evaluation	The process of assessing a model using one/various evaluation metric(s) such as goodness of fit and accuracy.
prediction	The process of applying a model trained on a set of data to other or newly arriving pieces of data to forecast new values.
result_visualization	The process of obtaining a graphical representation (e.g., tables, plots, graphs) of a/several measurement(s)
save_results	The process of serialising and storing the data.
comment_only	Lines of comment including commented code.