Skip to main content
. 2022 Nov 19;28(1):7. doi: 10.1007/s10664-022-10229-z

Table 1.

Data Science steps annotated in DASWOW

Label Definition
helper_functions Code that is not directly related to the data science activity at hand, but provides useful scripting functions (e. g. importing or configuring libraries).
load_data The process of loading a dataset of any type (e.g., .csv, .pkl) into a Jupyter notebook environment.
data_preprocessing The process of preparing the dataset(s) for the subsequent analysis. It includes tasks such as cleaning, instance selection, normalisation, data transformation, and feature selection.
data_exploration The process of inspecting the content and shape of a dataset to understand the nature and characteristics of the data. Note that it may involve the usage of visualisation techniques but differs in its purpose.
modelling The process of applying statistical models and learning-based algorithms to learn from sample data.
evaluation The process of assessing a model using one/various evaluation metric(s) such as goodness of fit and accuracy.
prediction The process of applying a model trained on a set of data to other or newly arriving pieces of data to forecast new values.
result_visualization The process of obtaining a graphical representation (e.g., tables, plots, graphs) of a/several measurement(s)
save_results The process of serialising and storing the data.
comment_only Lines of comment including commented code.