Table 4. Popular workflow management systems.
Comparison aspect | Toil ✠ [93] | Rabix [95] | Cromwell [14] |
---|---|---|---|
Nature | Execution engine | Execution engine | Execution engine |
Support community standard WL? | CWL, WDL | CWL | WDL # [97] |
User interface | CLI | GUI ⋆, CLI | CLI |
Programming paradigm [77] | Sequential † [13, 94] | Dataflow [13] | Dataflow |
Containerization support | Docker | Docker | Docker |
Scalability [80] | Petascale | Yes | Yes |
Checkpointing and caching | Yes | Yes | Yes |
Portability ¶ | LSF, Parasol, Apache Mesos, Open stack, MS Azure, Google Cloud & Compute Engine |
Open stack, Google Cloud § |
LSF, HTCondor, Google JES § |
Distributed execution | Spark | - | Spark |
Supported compute architecture | Homogeneous or heterogeneous | Homogeneous § | Homogeneous § |
Compute resource allocation | Allocated dynamically | Reserved apriori § | Reserved a priori |
✠ Toil uniquely has notions of object store and data encryption, which can assure compliance with strict data security requirements.
# Work is ongoing to incorporate support for CWL into Cromwell.
⋆ Rabix composer (http://docs.rabix.io/rabix-composer-home) is a stand-alone GUI editor for CWL workflows.
† In Toil child jobs are executed after their parents have completed (in parallel), and follow-on jobs are run after the successors and their child jobs have finished execution (also in parallel). This creates a Directed Acyclic Graph of jobs to be run, similarly to dataflow. But, unlike in dataflow model, the order of execution depends on whether the parent job has finished and its relation to other jobs, as opposed to whether the data are ready.
¶ All these workflow management systems can run on a single server, on clusters managed by PBS, Grid Engine, Slurm, and also on AWS.
§ Work is ongoing to also provide support for the GA4GH TES job management system.