. 2021 Oct 4;12:5797. doi: 10.1038/s41467-021-25974-w

Table 1.

Data-processing platforms and their respective features for handling multimodal data.

Features	ORCESTRA (Pachyderm)	DNAnexus	Databricks	Lifebit
Create language-agnostic pipelines in the cloud	✓	✓	✓	✓
Large dataset support (TB in size)	✓	✓	✓	✓
Automatic pipeline triggering with updated data (out-of-the-box)	✓	X	X	X
Prevents recomputation of entire dataset with each new pipeline trigger	✓	✓	X	✓
Docker utilization	✓	✓	✓	✓
Every pipeline run and data sources are versioned with a unique identifier	✓	✓	^a	^a
Parallelism support	✓	✓	✓	✓
Versioning system (e.g., GitHub) for pipelines and input data	✓	✓	✓	✓
Open access (free)	✓	X	X	X
Direct mounting of data (no copying into file system)	X	X	✓	✓
Automatic cost-efficiency implementation for instances (low-priority)	X	X	X	✓
No permanent resource allocation for a pipeline (memory/CPU)	X	✓	✓	✓

^aIndicates partial support of the feature.

Each feature was tested against each platform using biomedical data as an input data source.