Figure 4.
Major phases of data science pipeline towards decision making and analysis. Data initially collected and integrated from many sources. Then they need to be pre-processed to filter uninformative or possibly misleading values (e.g. outliers or noise). Then existing models are used to explain data or extract relevant patterns describing data or predicting associations. Finally, results need to be interpreted and explained by domain experts. Each step of analysis may generate corrections or refinements that are applied to precedent steps.