Skip to main content
. 2021 Oct 4;12:5797. doi: 10.1038/s41467-021-25974-w

Table 1.

Data-processing platforms and their respective features for handling multimodal data.

Features ORCESTRA (Pachyderm) DNAnexus Databricks Lifebit
Create language-agnostic pipelines in the cloud
Large dataset support (TB in size)
Automatic pipeline triggering with updated data (out-of-the-box) X X X
Prevents recomputation of entire dataset with each new pipeline trigger X
Docker utilization
Every pipeline run and data sources are versioned with a unique identifier a a
Parallelism support
Versioning system (e.g., GitHub) for pipelines and input data
Open access (free) X X X
Direct mounting of data (no copying into file system) X X
Automatic cost-efficiency implementation for instances (low-priority) X X X
No permanent resource allocation for a pipeline (memory/CPU) X

aIndicates partial support of the feature.

Each feature was tested against each platform using biomedical data as an input data source.