Skip to main content
. 2022 Nov 30;23:514. doi: 10.1186/s12859-022-05065-3

Table 1.

Examples of Python tools and frameworks for scalable data science

Name Description Website References
bodo.ai Native Python framework that improves performance with automated parallelization and compiler optimization https://bodo.ai/ NA
Dask Framework that parallelizes SPE data science operations with a familiar API https://dask.org/ [22]
Fugue Unified interface for distributed computing running Pandas code on Spark and Dask without any rewrites https://github.com/fugue-project/ NA
Koalas Project that simplifies the use of Spark distributed dataframes by adopting pandas’ DataFrame API https://koalas.readthedocs.io/ NA
Modin Library for interoperating with scalable ML frameworks https://modin.readthedocs.io/ [99, 100]
RAPIDS Framework for simplified GPU data science https://rapids.ai/ [60]
Ray Framework for scaling compute-intensive ML pipelines https://www.ray.io/ [101]
Scalable Dataframe Compiler A tool for compiling pandas operations on dataframes to facilitate parallelization https://github.com/IntelPython/sdc [102]
Vaex Standalone tool for visualizing data and performing statistical calculations https://vaex.io/ [103]