. 2022 Nov 30;23:514. doi: 10.1186/s12859-022-05065-3

Table 1.

Examples of Python tools and frameworks for scalable data science

Name	Description	Website	References
bodo.ai	Native Python framework that improves performance with automated parallelization and compiler optimization	https://bodo.ai/	NA
Dask	Framework that parallelizes SPE data science operations with a familiar API	https://dask.org/	[22]
Fugue	Unified interface for distributed computing running Pandas code on Spark and Dask without any rewrites	https://github.com/fugue-project/	NA
Koalas	Project that simplifies the use of Spark distributed dataframes by adopting pandas’ DataFrame API	https://koalas.readthedocs.io/	NA
Modin	Library for interoperating with scalable ML frameworks	https://modin.readthedocs.io/	[99, 100]
RAPIDS	Framework for simplified GPU data science	https://rapids.ai/	[60]
Ray	Framework for scaling compute-intensive ML pipelines	https://www.ray.io/	[101]
Scalable Dataframe Compiler	A tool for compiling pandas operations on dataframes to facilitate parallelization	https://github.com/IntelPython/sdc	[102]
Vaex	Standalone tool for visualizing data and performing statistical calculations	https://vaex.io/	[103]