Supplementary Vignette 2

Example workflow for H&E images

Here we demonstrate a typical workflow for preprocessing of H&E images. The image used in this example is publicly avilalable for download: http://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/

a. Load the image

b. Define a preprocessing pipeline

Pipelines are created by composing a sequence of modular transformations; in this example we apply a blur to reduce noise in the image followed by tissue detection

c. Run preprocessing

Now that we have constructed our pipeline, we are ready to run it on our WSI. PathML supports distributed computing, speeding up processing by running tiles in parallel among many workers rather than processing each tile sequentially on a single worker. This is supported by Dask.distributed on the backend, and is highly scalable for very large datasets.

The first step is to create a Client object. In this case, we will use a simple cluster running locally; however, Dask supports other setups including Kubernetes, SLURM, etc. See the PathML documentation for more information.

e. Save results to disk

The resulting preprocessed data is written to disk, leveraging the HDF5 data specification optimized for efficiently manipulating larger-than-memory data.

f. Create PyTorch DataLoader

The DataLoader provides an interface with any machine learning model built on the PyTorch ecosystem

Summary

Here we demonstrate a complete PathML workflow for analyzing brightfield images:

  1. Loading the raw image
  2. Define a simple preprocessing pipeline for tissue detection
  3. Create a PyTorch DataLoader for with any downstream machine learning model

Full documentation of the PathML API is available at https://pathml.org.

Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/