Skip to main content
. Author manuscript; available in PMC: 2020 Jun 1.
Published in final edited form as: IEEE Trans Big Data. 2018 Mar 6;5(2):109–119. doi: 10.1109/TBDATA.2018.2811508

Fig. 2.

Fig. 2.

Illustration of the spark stack with its components. Spark offers a functional programming API to manipulate Resilient Distributed Datasets (RDDs). RDDs represent a collection of items distributed across many compute nodes that can be manipulated in parallel. Spark Core is a computational engine responsible for scheduling, distribution and monitoring applications which consists of many computational tasks across worker machine(s) on a computation machine/cluster.