Supplementary Vignette 3

Example workflow for immunofluorescence images

Here we demonstrate a typical workflow for preprocessing of immunofluorescence images. The image used in this example is a tissue microarray (TMA) generated on the CODEX spatial proteomics imaging platform, from Schurch et al., Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front (Cell, 2020). The image used in this example is publicly avilalable for download from the Cancer Imaging Archive: https://doi.org/10.7937/tcia.2020.fqn0-0326

a. Load the image

The CODEX imaging protocol is cyclic, so markers are imaged in groups of 4. These images use the standard convention of (X, Y, Z, C, T) channel order. In this case, the time dimension (T) is being used to denote cycles; here we see that the image has 17 z-slices for 23 cycles of 4 markers each.

b. Define a preprocessing pipeline

Pipelines are created by composing a sequence of modular transformations; in this example we first choose a z-slice from our CODEX image, then segment the cells using the pre-trained Mesmer machine learning model, and finally quantify the expression of each protein in each cell.

c. Run preprocessing

In this example, we choose not to distribute computation as the image is relatively small. Instead, we process the image as a single tile:

e. Save results to disk

The resulting preprocessed data is written to disk, leveraging the HDF5 data specification optimized for efficiently manipulating larger-than-memory data.

f. AnnData Integration and Spatial Single Cell Analysis

Now let's explore the single-cell quantification of our imaging data. Our pipeline produced a single-cell matrix of shape (n_cell x n_proteins) where each cell has attached additional information including location on the slide and the size of the cell in the image. This information is stored in slidedata.counts as an Anndata object (https://anndata.readthedocs.io/en/latest/anndata.AnnData.html).

AnnData is a standard data format, so this AnnData object gives us access to the entire Python (or Seurat) ecosystem of single cell analysis tools. We follow a single cell analysis workflow described in https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html and https://www.embopress.org/doi/full/10.15252/msb.20188746.

First we look at a violin plot for three randomly selected markers:

Next, we use UMAP to look at the cells in a low-dimensional visualization, colored by expression levels of the three markers:

Here, we perform Leiden clustering in the expression space:

Next, we use a dotplot to visualize the markers for each group:

Here, we plot the clustering results in the spatial domain, highlighting the spatial organization of the tissue:

Finally, we compute the co-occurrence probability of the clusters, highlighting the interface with spatial analysis tools such as Squidpy: https://github.com/theislab/squidpy.

Summary

Here we demonstrate a complete PathML workflow for analyzing immunofluorescence images:

  1. Loading raw image in TIFF format
  2. Define a preprocessing pipeline for cell segmentation and marker quantification for each cell
  3. Integrate with other commonly used tools such as Scanpy for working with the quantified cell-level data:
    • dimensionality reduction
    • clustering
    • co-occurence analysis
    • visualization

Full documentation of the PathML API is available at https://pathml.org.

Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/