PathML
provides support for loading a wide array of imaging modalities and file formats under a standardized syntax. In this vignette, we highlight code snippets for loading a range of image types ranging from brightfield H&E and IHC to highly multiplexed immunofluorescence and spatial expression and proteomics, from small images to gigapixel scale:
Imaging modality | File format | Source | Image dimensions (X, Y, Z, C, T) |
---|---|---|---|
Brightfield H&E | Aperio SVS | OpenSlide example data | (32914, 46000, 1, 3, 1) |
Brightfield H&E | Generic tiled TIFF | OpenSlide example data | (32914, 46000, 1, 3, 1) |
Brightfield IHC | Hamamatsu NDPI | OpenSlide example data | (73728, 126976, 1, 3, 1) |
Brightfield H&E | Hamamatsu VMS | OpenSlide example data | (76288, 102400, 1, 3, 1) |
Brightfield H&E | Leica SCN | OpenSlide example data | (153470, 53130, 1, 3, 1) |
Fluorescence | MIRAX | OpenSlide example data | (170960, 76324, 1, 3, 1) |
Brightfield IHC | Olympus VSI | OpenSlide example data | (6753, 13196, 1, 3, 1) |
Brightfield H&E | Trestle TIFF | OpenSlide example data | (25408, 61504, 1, 3, 1) |
Brightfield H&E | Ventana BIF | OpenSlide example data | (93951, 105813, 1, 3, 1) |
Fluorescence | Zeiss ZVI | OpenSlide example data | (1388, 1040, 13, 3, 1) |
Brightfield H&E | DICOM | Orthanc example data | (30462, 78000, 1, 3, 1) |
Fluorescence (CODEX spatial proteomics) | TIFF | Schurch et al., Cell 2020 | (1920, 1440, 17, 4, 23) |
Fluorescence (time-series + volumetric) | OME-TIFF | OME-TIFF example data | (512, 512, 10, 2, 43) |
Fluorescence (MERFISH spatial gene expression) | TIF | Zhuang et al., 2020 | (2048, 2048, 7, 1, 40) |
Fluorescence (Visium 10x spatial gene expression) | TIFF | 10x Genomics | (25088, 26624, 1, 1, 4) |
All images used in these examples are publicly available for download at the links listed above.
Note that across the wide diversity of modalities and file formats, the syntax for loading images is consistent (see examples below).
# import utilities for loading images
from pathml.core import HESlide, CODEXSlide, VectraSlide, SlideData, types
my_aperio_image = HESlide("./data/CMU-1.svs")
my_generic_tiff_image = HESlide("./data/CMU-1.tiff", backend = "bioformats")
The labels
field can be used to store slide-level metadata. For example, in this case we store the target gene, which is Ki-67:
my_ndpi_image = SlideData("./data/OS-2.ndpi",
labels = {"taget" : "Ki-67"},
slide_type = types.IHC)
my_vms_image = HESlide("./data/CMU-1/CMU-1-40x - 2010-01-12 13.24.05.vms", backend = "openslide")
my_leica_image = HESlide("./data/Leica-1.scn")
my_mirax_image = SlideData("./data/Mirax2-Fluorescence-1/Mirax2-Fluorescence-1.mrxs",
slide_type = types.IF)
Again, we use the labels
field to store slide-level metadata such as the name of the target gene.
my_olympus_vsi = SlideData("./data/OS-3/OS-3.vsi",
labels = {"taget" : "PTEN"},
slide_type = types.IHC)
my_trestle_tiff = SlideData("./data/CMU-2/CMU-2.tif")
my_ventana_bif = SlideData("./data/OS-1.bif")
Again, we use the labels
field to store slide-level metadata such as the name of the target gene.
my_zeiss_zvi = SlideData("./data/Zeiss-1-Stacked.zvi",
labels = {"target" : "HER-2"},
slide_type = types.IF)
my_dicom = HESlide("./data/orthanc_example.dcm")
my_volumetric_timeseries_image = SlideData(
"./data/tubhiswt-4D/tubhiswt_C1_TP42.ome.tif",
labels = {"organism" : "C elegans"},
volumetric = True,
time_series = True,
backend = "bioformats"
)
The labels
field can be used to store whatever slide-level metadata the user wants; here we specify the tissue type
my_codex_image = CODEXSlide('../../data/reg031_X01_Y01.tif',
labels = {"tissue type" : "CRC"});
my_merfish_image = SlideData("./data/aligned_images0.tif", backend = "bioformats")
Here we load an image with accompanying expression data in AnnData
format.
# load the counts matrix of spatial genomics information
import scanpy as sc
adata = sc.read_10x_h5("./data/Visium_FFPE_Mouse_Brain_IF_raw_feature_bc_matrix.h5")
# load the image, with accompanying counts matrix metadata
my_visium_image = SlideData("./data/Visium_FFPE_Mouse_Brain_IF_image.tif",
counts=adata,
backend = "bioformats")
Variable names are not unique. To make them unique, call `.var_names_make_unique`. Variable names are not unique. To make them unique, call `.var_names_make_unique`.
The PathML
API provides a consistent, easy to use interface for loading a wide range of imaging data:
The output from all of the code snippets above is a SlideData
object compatible with the PathML
preprocessing module.
Full documentation of the PathML
API is available at https://pathml.org.
Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/