Figure 3.
A workflow execution from BIOMERO will use a lot of automated layers of execution
It will start at the OMERO script “SLURM Run Workflow” that will allow a user to select a workflow with its parameters (defined in descriptor.json on GitHub). Once selected, the OMERO script will export the selected input data (dataset, plate, or just a set of images) as ZARR files and wrap them up in a ZIP. The ZIP will be transferred to a Slurm node. The script will then order the workflow (preinstalled on Slurm) to be run. This will trigger a command on the Slurm’s Linux shell to queue a new job based on a (preinstalled) job script and environment variables. The Slurm job script will first define the hardware configuration required for this workflow (GPU, CPU, etc.) but also a predefined execution duration and the location of the log file. Next, it could trigger new Slurm jobs to convert all ZARR data to TIFF files and await their execution, or this conversion can be executed beforehand by the BIOMERO script. Once completed, it will start a Singularity container for the workflow (preinstalled on Slurm from Docker) and await its completion. The container will have an installed environment (OS, libraries, etc.) defined in its Dockerfile, required for the workflow to execute properly. When run, the container will call upon an entrypoint script called wrapper.py (also defined in its Dockerfile and on GitHub). This wrapper script, based on BIAFLOWS, will read and validate the required parameters (defined in descriptor.json on GitHub) from the command line. Furthermore, the wrapper will handle any preprocessing required (e.g., cut images into smaller sizes) and transfer data to a temporary folder before it will start a sub-process with the actual executable code (Fiji Macro, MATLAB executable, Python or R script, CellProfiler pipeline, etc.). After execution is finished, generated output files will be moved to the “out” folder, the temporary folder will be cleaned up, and Slurm will be informed of a successful execution. The OMERO script will poll the Slurm cluster for status updates during this execution and start extracting output and log data when it finds the job completed. Once the data are back on the OMERO server, it will upload them into the OMERO system based on user-defined preferences (e.g., as images in a new dataset or as a ZIP attachment for a project). Finally, all intermediate data will be cleaned up, and the OMERO.web UI will be informed of a successful script execution (with appropriate logging messages).
