Summary
In categorical data visualization, appropriate color arrangements can avoid perceptual ambiguity and help perceive underlying data patterns. We introduce a protocol to assign contrastive colors to neighboring categories using both Python and R packages. We describe steps for calculating the interlacement between clusters and generating a proper color palette and calculating color contrast. We then detail procedures for aligning cluster interlacement and color contrast to get an optimized cluster-color assignment, achieving clear categorical visualization.
For complete details on the use and execution of this protocol, please refer to Jing et al.1
Subject areas: bioinformatics, Single cell, computer sciences
Graphical abstract

Highlights
-
•
A spatially aware colorization protocol to optimize categorical visualization
-
•
Steps for enhancing visual clarity using adaptive color palette generation
-
•
Instructions for using the Spaco software package
-
•
Steps to implement the protocol for both Python and R language
Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics.
In categorical data visualization, appropriate color arrangements can avoid perceptual ambiguity and help perceive underlying data patterns. We introduce a protocol to assign contrastive colors to neighboring categories using both Python and R packages. We describe steps for calculating the interlacement between clusters and generating a proper color palette and calculating color contrast. We then detail procedures for aligning cluster interlacement and color contrast to get an optimized cluster-color assignment, achieving clear categorical visualization.
Before you begin
The basic idea to visualize categorical data is to select unique colors for each category and use them to plot labeled data points.2,3 When visualizing spatially resolved data, traditional palettes and lexicographical color-category mapping often result in neighboring categories displaying similar colors, thereby making visual differentiation more difficult.4,5,6 To effectively visualize spatial data, colorization protocol must consider the positioning of the data points.
Spaco is a spatially aware colorization method for enhanced categorical data visualization, applicable across various research fields. Generally, this method separately models the spatial interlacement between different categories and the perceptual contrast between different colors and leverages them to optimize color-category mapping. It first constructs a spatial interlacement matrix to globally consider spatial neighboring between categories, by calculating the degree of interlacement (DOI) metric between different categories. Then the matrix is aligned with a color difference matrix, whose values are the perceptual contrast between different colors. In addition, Spaco provides multiple auxiliary modules, such as adaptive palette selection and color-blind support to extend its usage in various scenarios.
When adapted to the field of spatial transcriptomics, a comprehensive understanding of tissue architecture is based on the exploration of anatomical and histological regions, as well as the spatial organization of cell types.7 Spaco, through its colorization process, ensures unbiased visual perception of tissue domains and cell type distribution, thereby facilitating subsequent bioinformatics analysis.
In this protocol, we provide step-by-step demonstration to obtain enhanced cell type visualization in spatial transcriptomic data, using the Spaco (version 0.2.0) package with Python or the SpacoR (version 0.1) package with R.
Note: Although we limited the demonstration to spatial transcriptomics datasets, Spaco is universally applicable for categorical data visualization across research fields. Thus, we will parallelly describe some major steps of this protocol in a style of general data science, using the NOTE section.
Data requirements for Spaco
Timing: ∼5 min
Before using Spaco, you need a processed spatially resolved transcriptomics dataset with cell annotation or cluster labels.
-
1.
Prepare your spatial dataset.
Note: In the field of spatial transcriptomics, such dataset can be sourced from various experimental technologies such as 10X Genomics Visium,8 STARmap,9 MERFISH,10 seqFISH11 or Slide-seq.12
-
2.
Preprocess and categorize your dataset.
Note: Spatially resolved transcriptomics datasets can be clustered or annotated using bioinformatics packages like Seurat,13 Giotto14 in R, or like Squidpy,15 Spateo16 in Python.
Note: Spaco provides an interface to SeuratObject or AnnData, but can also accept a list of cell coordinates with corresponding annotations. For general application on datasets across research fields, the required input of Spaco includes: (1) coordinates or embeddings of data points (i.e. samples) in desired visualization space; (2) categorical labels of the data points.
Installation of Spaco package
Timing: ∼2–5 min
Our “spaco” (Python) or “SpacoR” (R) package will facilitate the implementation and execution of this protocol. We guide you through the installation of one of these packages depending which programming language you use.
-
3.Install the Spaco package.
-
a.Python Version.Note: The Python package of Spaco is available on PyPI, and the source code is also publicly available on GitHub under the GNU General Public License v3.0. You can use 'pip' to directly install the 'spaco-release' package from PyPI, or build from the latest source code on GitHub repository “BrainStOrmics/Spaco”.Note: Python should be pre-installed on most Linux distributions. For Windows and macOS, please refer to the official website of Python to download and set up a Python environment if you need.# install from PyPIpip install spaco-release# or install from latest source from github (Recommended)pip install git+https://github.com/BrainStOrmics/Spaco.git.Note: In the code blocks, the '#' symbol is used to indicate comments, which are not executable as code.Note: When you install this package, all required dependencies will also be installed automatically by 'pip'. The installation of the Spaco (Python version), along with its dependencies, takes approximately 1 min depending on your downloading speed. The installation time for the Spaco itself takes about 10 sec.
-
b.R Version.
-
a.
Note: The 'SpacoR' package is not yet available on the standard R repositories (CRAN). However, it is publicly available on GitHub under the GNU General Public License v3.0, and can be installed from source. The 'devtools' package offers the function `install_github()` to install packages directly from GitHub. You can use this function to install the 'SpacoR' package directly from the GitHub repository "BrainStOrmics/SpacoR". You need to install the 'devtools' package first, if you have not.
Note: The R environment can be set up following the official instructions of R. We also recommend the Rstudio IDE for an easier reproduction of this protocol.
install.packages("devtools")
devtools::install_github("https://github.com/BrainStOrmics/SpacoR")
Note: When you install 'SpacoR' using 'devtools', it automatically resolves all required dependencies. The installation of the SpacoR package, along with its dependencies, takes a total of 2 min. However, if the dependencies are already installed, the installation time for the SpacoR package alone can be reduced to approximately 10 sec.
-
4.Load the packages.
-
a.Python Version.Note: If the Python package has been successfully installed, it should be able to be imported into the Python session using `import`.import spaco
-
b.R Version
-
a.
Note: If the R package has been successfully installed, it should be able to load the package into the R session using the `library` function.
library(SpacoR)
-
5.Check the package version.
-
a.Python VersionNote: This protocol is written for Spaco v0.2.0 for Python.spaco.__version__# return# [1] '0.2.0'
-
b.R Version
-
a.
Note: This protocol is written for SpacoR v0.1 for R.
packageVersion("SpacoR")
# return
# [1] '0.1'
Key resources table
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Deposited data | ||
| seqFISH mouse embryo dataset | Lohoff et al.17 | https://marionilab.cruk.cam.ac.uk/SpatialMouseAtlas/ |
| 10X Visium mouse brain dataset | 10× Genomics | https://support.10xgenomics.com/spatial-gene-expression/datasets |
| Software and algorithms | ||
| spaco (v0.2.0) | Jing et al.1 | https://github.com/BrainStOrmics/Spaco |
| python (v3.8.0) | Python | https://www.python.org/ |
| numpy (v1.18.0) | Harris et al.18 | https://numpy.org/ |
| pandas (v0.25.1) | McKinney19 | https://pandas.pydata.org/ |
| scipy (v1.10.0) | Virtanen et al.20 | https://scipy.org/ |
| anndata (v0.8.0) | Virshup et al.21 | https://anndata.readthedocs.io/en/latest/ |
| matplotlib (v3.7.1) | Hunter JD.5 | https://matplotlib.org/ |
| seaborn (v0.12.2) | Waskom et al.22 | https://seaborn.pydata.org/ |
| scanpy (v1.9.3) | Wolf et al.23 | https://scanpy.readthedocs.io/ |
| squidpy (v1.2.3) | Palla et al.15 | https://squidpy.readthedocs.io/ |
| SpacoR (v0.1) | Jing et al.1 | https://github.com/BrainStOrmics/SpacoR |
| R (v4.3.1) | R CRAN | https://cran.r-project.org/ |
| Seurat (v4.3.0.1) | Butler et al.24 | https://satijalab.org/seurat/ |
| colorspace (v2.1–0) | Zeileis et al.25 | https://github.com/cran/colorspace/tree/master |
| FNN (v1.1.4) | Sabry et al.26 | https://github.com/cran/FNN |
| OpenImageR (v1.3.0) | Lampros et al., GitHub | https://github.com/mlampros/OpenImageR |
| SeuratData (v0.2.2.9001) | Satija et al., GitHub | https://github.com/satijalab/seurat-data |
Step-by-step method details
Here, we describe the detailed procedures for using either the Python or R package of Spaco to enhance the colorization of cell type cartograph on multiple spatial transcriptomics datasets. This protocol covers the workflow from data loading to the application of Spaco’s colorization algorithms, culminating in the visualization of cells with contrastive colors to distinctly identify the distribution of each cell type. Our protocol ensures that users working with either Python or R can effectively utilize Spaco to its full potential in spatial data exploration. The jupyter notebooks of this protocol can be found in our GitHub repository 'BrainStOrmics/Spaco_scripts'.
Note: For general application on datasets across research fields, or for those with raw cell coordinates and annotations (i.e., not in SeuratObject13 or AnnData23 format), you can jump to Part 2.
Part 1: Data import and initialization
Timing: ∼3 min
The core colorization algorithm of Spaco requires only two inputs: (1) the coordinate of cells and (2) their labels (i.e., annotations or cluster labels). However, to seamlessly integrate into most analytic pipelines, Spaco accepts these inputs via widely-used data formats in the field of spatial transcriptomic, such as SeuratObject and AnnData. These data formats can contain spatial transcriptomic profiles along with rich metadata. We first guide you through some standard procedures to prepare such datasets, referring to the 'Squidpy' package15 in Python or the 'Seurat' package13 in R.
-
1.Load the Spaco package and Dataset.
-
a.Python Version.Note: We use the Squidpy package to download the example dataset. If you do not have it, you can install it using 'pip'.pip install squidpyNote: For Python users, we demonstrate the procedures using a pre-annotated seqFISH mouse embryo dataset,27 obtained from one of the analytic tutorials of 'Squidpy' package. We need to run `import squidpy as sq`, and load the dataset as an 'AnnData'23 using `sq.datasets.seqfish()` (Figure 1A). In this data object, the cells or spots in the dataset have already been annotated and saved in the metadata column 'celltype_mapped_refined', which can be accessed with `adata.obs['celltype_mapped_refined']`. For the convenience of description, we copy and rename it as 'annotation'.# get the seqFISH mouse embryo datasetimport squidpy as sqadata = sq.datasets.seqfish()# get the pre-annotated cell labeladata.obs['annotation'] = adata.obs['celltype_mapped_refined'].copy()
-
b.R Version.
-
a.
Note: For users of the R language with 'SpacoR', our demonstration follows the 10x Genomics Visium mouse brain dataset28 which is included in the vignette of the widely used 'Seurat'13 package. As demonstrated in the vignette, the raw dataset is accessible through the 'SeuratData' package, and can be loaded as a 'SeuratObject' using `LoadData("stxBrain", type = "anterior1″)` (Figure 1D). To get the required cell labels to run Spaco, the data requires a series of preprocessing steps, such as normalization, dimensionality reduction, and clustering. We go through these steps using 'Seurat'13 following the vignette.
Note: Before loading the dataset, it is necessary to install the 'SeuratData' package referring to SeuratData Installation, and to install the dataset using the following command:
SeuratData::InstallData("stxBrain")
# Or to install the dataset via
install.packages("http://seurat.nygenome.org/src/contrib/stxBrain.SeuratData_0.1.1.tar.gz", repos = NULL, type = "source")
# Load dataset
library(SeuratData)
brain <- LoadData("stxBrain", type = "anterior1")
# Clustering analysis following Seurat's vignette
library(Seurat)
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE)
brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30)
brain <- FindClusters(brain, verbose = FALSE)
brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
brain$seurat_clusters <- as.character(brain$seurat_clusters)
# Extract spatial coordinate of cells
coor = c(
as.integer(brain@images$anterior1@coordinates$col),
as.integer(brain@images$anterior1@coordinates$row)
)
dim(coor) = c(dim(brain)[2],2)
coor <- t(coor)
Note: The cluster labels calculated by `FindClusters()` should now be in the metadata of the 'SeuratObject' accessible with `brain$seurat_clusters` (Figure 1E). The coordinates of cells are stored in `coor`. We extract the coordinates and store them as `coor` to facilitate passing to Spaco functions (Figure 1F).
-
2.Filter cell types.Optional: In certain scenarios, it may be helpful to filter out rare cell types (i.e., categories) from your dataset before visualizing with Spaco. This is because in global visualizations of large datasets, rare cell types are hard to spot and often do not provide significant insights due to their sparse presence.
-
a.Python Version.Note: For the seqFISH dataset (19416 cells), we set a threshold (`min_cells`) of 3 and filter out any cell types that have a count less than this threshold.# filter out cell types with less than 3 cellsmin_cells = 3unique_types = np.unique(adata.obs['annotation'], return_counts=True)filtered_types = unique_types[0][unique_types[1]>min_cells]adata = adata[adata.obs['annotation'].isin(filtered_types)].copy()
-
b.R Version.
-
a.
Note: As for the Visium dataset in the R workflow, we do not perform filtering in this particular demonstration as the total number of cells are relatively small (2696 cells). Still, we provide a mock filtering code to facilitate other scenarios where filtering might be needed.
# This is for reference when you apply SpacoR to other datasets where filtering is needed
# Please do not run this if you are reproducing the results in our demonstration
# filter out cell types with less than 3 cells
min_cells <- 3
cell_counts <- table($brain@meta.data$seurat_clusters)
filtered_types <- names(cell_counts[cell_counts > min_cells])
brain <- subset(brain, subset = seurat_clusters %in% filtered_types)
Note: The steps above are optional and can be tailored based on the specific requirements of your analysis. The decision to exclude rare cell types from visualization should be guided by the significance of smaller populations in your study.
-
3.Check the datasets and try their default visualization.Note: You can check if the dataset is correctly loaded and labeled by visualizing it using Scanpy `scanpy.pl.spatial()` in Python or Seurat `SpatialDimPlot()` in R (Figures 2A and 2B). It also shows the default colorization with a weak clarity so you can compare it to the enhanced colorization in Part 5.
-
a.Python version.# View the data with default visualization by Scanpysc.set_figure_params(figsize=(3,6), facecolor="white", dpi_save=300)sc.pl.spatial(adata, color="annotation", spot_size=0.035)
-
b.R version
-
a.
# View the data with default visualization by Seurat
p <- SpatialDimPlot(brain, label = TRUE, label.size = 7)
p.
Figure 1.
Screenshots of the required inputs of Spaco
(A–F) (A) For the Python version. The 'AnnData' object loaded from the Squidpy package; (B) For the Python version. The format of the spatial coordinates in the 'AnnData', which are stored in `adata.obsm['spatial']`; (C) For the Python version. The format of the cell type annotation, available in `adata.obs['annotation']` of the 'AnnData' object; (D) For the R version. Screenshot of the input 'SeuratObject'. (E) For the R version. The cluster labels for cells in the 'SeuratObject' acquired by pre-analysis steps, which are stored and available through `brain$seurat_clusters`. (F) For the R version. The format of cell coordinates extracted from the 'SeuratObject'.
Figure 2.
Screenshots of the default visualization of the datasets
(A) For the Python version. The default visualization and colorization of the seqFISH dataset using Scanpy package.
(B) For the R version. The default visualization and colorization of the Visium dataset using Seurat package.
Part 2: Model spatial interlacement between cell clusters
Timing: <2 min
After successfully importing and preprocessing your datasets, we move to Spaco’s colorization workflow. As Spaco colorizes the cell clusters based on their spatial relationship, we first model the spatial interlacement between cell clusters. In this part, we utilize the Degree of Interlacement (DOI) metric which is implemented in the `spatial_distance()` function, offering a quantifiable approach to understand how cells are intertwined spatially.
-
4.Calculate the Degree of Interlacement (DOI) between cell clusters using the `spatial_distance()` function.Note: The `spatial_distance()` function is available in both R and Python versions of Spaco. For Python, use `spaco.distance.spatial_distance()`, and for R, use `spatial_distance()`.
-
a.Python version.spatial_dis_py = spaco.distance.spatial_distance(cell_coordinates=adata.obsm['spatial'],cell_labels=adata.obs['annotation'],radius=200,n_neighbors=30,)spatial_dis_py
-
b.R version.
-
a.
spatial_dis_R <- spatial_distance(
cell_coordinates = coor,
cell_labels = brain$seurat_clusters,
radius = 20,
n_neighbors = 50
)
spatial_dis_R
Note: In the following, we describe the details of calculation and especially the function parameters to facilitate correct usage of this function. As the function and parameters are aligned in Python and R version, we merge the description below, which is applicable for both versions of Spaco.
Note: The `spatial_distance()` is a wrapper function to calculate the pairwise DOI scores between all clusters, which results in a symmetric matrix with each element representing the interlacement score between the corresponding cluster pair. Generally, the calculation of DOI score constructs a k-nearest-neighbor network (with k given by `n_neighbors`) and measures the expected number of shared neighbors within a given `radius` between each cluster pair. In this process, it tackles cells with sparse distribution by excluding the cells where less than `n_cells` of their own type are present in the neighborhood.
CRITICAL: Parameters and configuration. `radius`, `n_neighbors`, `n_cells` can significantly affect the performance of this protocol. You should carefully tune these parameters following the description below.
cell_coordinates: A 2-dimensional list-like object containing spatial coordinates for each cell.
cell_labels: A list-like object containing the categorical labels of each cell.
radius: A numeric value to determine the maximum distance within which the neighboring relationship is considered. Defaults to 90. However, this is highly dependent on the density and the scaling of coordinates in your dataset. We suggest that `radius` should be adjusted near approximately 10 times of the cell diameter in a typical scenario. Larger `radius` leads to more distant relationships, biasing to the visualization of coarse structures, while smaller `radius` biases to the visualization of fine-grained structures or individual cells.
n_neighbors: An integer specifying the number of nearest neighbors to consider for each cell. This parameter defines the 'k' in constructing the modified spatial k-nearest neighbor network. Defaults to 16. However, we suggest `n_neighbors` should echo the selection of `radius`. Larger `n_neighbors` leads to considering all cells within `radius` as neighbors but biasing to neighborship in denser areas, while smaller `n_neighbors` make each cell neighborhood contribute equally to the DOI calculation but could lead to insufficient consideration of dense neighborhoods (biasing to sparser areas). That means, `n_neighbors` should be adjusted around the average number of cells within `radius` regarding the mean cell density of your dataset, especially when the spatial density of cells varies across the space.
n_cells: An integer specifying the minimum number of cells of its own category required in a neighborhood for it to be considered in the calculation. This helps tackle the sparsity of certain cell types, which are usually not informative in a global visual perception. `n_cells` should be adjusted in the range from 0 to `n_neighbors` according to your need to exclude sparse cell types from the calculation. Defaults to 3.
Note: The function returns an N x N shaped DataFrame (Figures 3A and 3B), where N is the number of categories extracted from input (i.e., `cell_labels`). Row names and column names are unique category labels (i.e., cell types), and the values are DOI scores for corresponding category pairs. Larger DOI values mean more interlacement between clusters.
Figure 3.
Screenshots of the output of the `spatial_distance()` function
(A and B) The symmetric cluster spatial interlacement matrix calculated by the `spatial_distance()` function in Python (A) and in R (B). Elements within the matrix represent the pairwise DOI (Degree of Interlacement) scores between cell clusters.
Note: For general application on datasets across research fields, you can just provide spatial coordinates or embeddings to `cell_coordinates`, and their categorical labels to `cell_labels`. The configuration of other parameters follows the same principle described above.
Part 3: Calculate perceptual differences between colors
Timing: <2 min
Having modeled the spatial interlacement between cell clusters in the previous part, we now advance to the calculation of perceptual differences between colors. This part also optionally involves generating a color palette when not provided manually.
-
5.Prepare a sufficient color palette for cell categories.Note: Before colorizing the dataset, you should have a proper color palette sufficient to color all the cell types or categories. A contrastive color palette is crucial to ensure the clarity of final visualization, so we list below three options for palette preparation.
-
a.Manually define the color palette.Optional: One option is to manually prepare a color palette as a list (or list-like object) of RGB colors (in hex-code form) (Figures 4A and 4B). While colors can be obtained from various sources, in some scenarios, users may want to keep the color palette they have previously plotted, or in other scenarios to get colors from third party packages like matplotlib5 in Python or colorbrewer29 in R. You can preview the colors of a list of RGB hex-codes using the `palplot()` function from the 'seaborn' package (in Python), or the `show_col()` function from the 'scales' package (in R) (Figures 4C and 4D).
-
i.Python Version.# Get the plotted palette from AnnDatapalette_default = adata.uns['celltype_mapped_refined_colors'].copy()# Optional: preview the palette using seabornimport seaborn as snssns.palplot(palette_default)
-
ii.R Version.palette_default = scales::hue_pal()(15)scales::show_col(palette_default)
-
i.
-
b.Automatic palette selection based on cluster interlacement using `embed_graph()`Optional: This option utilizes the spatial interlacement matrix (i.e., the DOI scores) from the previous step, and automatically fits a contrastive color palette. Generally, the `embed_graph()` function computes a 3-dimensional embedding of the spatial interlacement matrix, and then scales the embeddings to the CIELab30 color space.
-
i.Python Version.trim_fraction = 0.0125l_range = (30, 80)color_mapping = spaco.mapping.embed_graph(cluster_distance = spatial_dis_py,l_range = l_range,transformation = "umap",)
-
ii.R Version.trim_fraction = 0.0125l_range = c(30, 80)color_mapping <- embed_graph(cluster_distance = spatial_dis_R,l_range = l_range,transformation = "umap",)scales::show_col(color_mapping)Note: You can preview the palettes using the `palplot()` function from the 'seaborn' package (in Python), or the `show_col()` function from the 'scales' package (in R) (Figures 5C and 5D).Note: In both Python and R version, the `embed_graph()` function accepts the following parameters.
-
i.
-
a.
Figure 4.
Format and visualization of manually defined color palettes
(A and B) Screenshots of the format of a color palette in Python (A) and R (B).
(C) For the Python version. Preview of the color palette using the `palplot()` function from the Python 'seaborn' package.
(D) For the R version. Preview of the color palette using the `show_col()` function from the R 'scales' package.
Figure 5.
Visualization of automatically generated palette by the `embed_graph()` function
(A and B) Screenshot the output of the `embed_graph()` function in Python (A) and in R (B).
(C and D) Preview of the output palettes from the `embed_graph()` function in Python (C) and in R (D).
cluster_distance: A DataFrame contains the DOI scores between clusters. The indices and columns are unique cluster names. This is the output from `spatial_distance()`.
transformation: The method used for embedding. Defaults to "umap". Currently, can only be chosen between UMAP (Uniform Manifold Approximation and Projection)31 and MDS (Multidimensional Scaling).32 We only suggest using UMAP unless you want an exact linear transformation and ensure that no outliers are in the `cluster_distance` matrix.
l_range: A numerical tuple indicating the clipping range of the illumination (L) channel of the CIELab colors. Defaults to (10, 90). This determines the brightness of the colors. The range should be adjusted within (0, 100) to avoid generating colors that are too dark or too bright.
log_colors: A logical value determining whether to perform log-transformation for color embeddings, which can enhance color distinctions. Defaults to False. Should only enable this if the initial color embeddings are too concentrated or not distinct enough.
trim_fraction: A numerical value to set the quantile for trimming (value clipping) in the embedding process, affecting the spread of colors. Defaults to 0.0125.
Note: The `embed_graph()` function returns a dictionary (key-value list) where keys are unique categorical labels and values are the embedded colors of corresponding categories (Figures 5A and 5B).
-
c.
Extract thematic colors from an image using `extract_palette()`
Optional: This option allows for the extraction of thematic colors from a user-selected image, while retaining maximal contrast between extracted colors.2
Note: Currently this option is only implemented in the Python version of Spaco.
Note: We load an image from https://unsplash.com/ and convert it to RGB format for consistent color representation (Figure 6A).
Figure 6.
Thematic color palette extraction from an image using the `extract_palette()` function
(A) For the Python version. A user-selected (example) image from which thematic colors are extracted.
(B) For the Python version. The color palette extracted from the image in panel A using the extract_palette().
# import matplotlib package for image preview
import matplotlib # use the pillow package to load the image
# use `pip install pillow` to install if needed
from PIL import Image
img = Image.open("./data/photo-1707327956851-30a531b70cda.jpg").convert("RGB")
matplotlib.pyplot.imshow(img)
Note: Then we can use the `extract_palette()` function to get a sufficient number (22 colors for the seqFISH dataset) of colors for the categories (Figure 6B).
palette_img = spaco.utils.extract_palette(
reference_image = img,
n_colors = len(palette_default),
l_range = (30,80),
colorblind_type = "none",
)
# Optional: preview the palette using seaborn
import seaborn as sns
sns.palplot(palette_img)
Note: For the Python version, the `extract_palette()` function accepts the following parameters.
reference_image: A np.ndarray of RGB matrix of the image from which the color palette will be extracted. Choose an image that reflects the desired color scheme or thematic context for your data visualization.
n_colors: An integer indicating the number of colors to extract from the image. Set this based on the number of distinct clusters or categories in your data.
colorblind_type: Enumeration value chosen from: "none", "protanopia", "deuteranopia", "tritanopia", "general". This will customize the palette extraction process to get the best contrast for different types of color vision deficiency. "none" is for full color vision (no colorblind support), and "general" considers all three common colorblind types. Select the appropriate type to ensure readability for the audience or for your own convenience.
l_range: A tuple of numerical values indicating the clipping range of the L channel in CIELab colorspace. Defaults to (20, 85). Defines the illumination in the LAB colorspace for the extracted colors. Adjust within (0, 100) to ensure the colors are not too light or too dark.
verbose: A logical value controlling the output of intermediate information during the palette extraction process. Defaults to False. Set to 'True' for detailed output.
-
6.Calculate perceptual difference between colors using `perceptual_distance()`Note: After selecting the palette, Spaco computes a color perceptual difference matrix which represents the perceived contrast between selected colors. The perceptual difference between colors is evaluated using the "red mean" metric.33Note: We choose the manually option of input palette (the `palette_default`) from Part 3 for demonstration.
-
a.Python Version.color_distance_py = spaco.distance.perceptual_distance(colors = palette_default,colorblind_type = "none",)
-
b.R Version.
-
a.
color_distance_R <- perceptual_distance(
colors = palette_default,
colorblind_type = "none"
)
Note: In both Python and R version, the parameters of the `perceptual_distance()` function are described as followings:
colors: A list-like object of RGB colors in hex-code strings. This can be the output from the previous palette preparation, i.e., those selected in the previous step (either manually, `embed_graph()` generated, or extracted from images).
colorblind_type: Enumeration value chosen from: "none", "protanopia", "deuteranopia", "tritanopia", "general". This will customize the palette extraction process to get the best contrast for different types of color vision deficiency. "none" is for full color vision (no colorblind support), and "general" considers all three common colorblind types. Select the appropriate type to ensure readability for the audience or for your own convenience.
Note: In both Python and R version, the `perceptual_distance()` function returns a numerical DataFrame containing the perceptual differences between colors in the input palette (Figures 7A and 7B).
Figure 7.
Screenshots of the calculated color difference matrix using the `perceptual_difference()` function
(A and B) The numerical DataFrame that quantifies the perceptual differences among the colors calculated by ·perceptual_difference()` function in Python (A) and R (B).
Part 4: Match the cluster spatial interlacement with color perceptual difference for cluster-color assignment
Timing: ∼1 min
In this part, we utilize the two matrices calculated in the previous parts to obtain the optimal cluster-color assignment for best visualization clarity. Generally, the optimization seeks a correspondence between clusters and colors, where larger cluster spatial interlacement corresponds to larger color contrasts.
-
7.Optimize the mapping between clusters and colors using the `map_graph()` function.Note: This function seeks to establish a one-to-one correspondence (a permutation matrix) between clusters and colors, to align the cluster spatial interlacement matrix with the color perceptual difference matrix, minimizing the Frobenius norm between the two matrices.
-
a.Python Version.color_mapping = spaco.mapping.map_graph(cluster_distance = spatial_dis_py,color_distance = color_distance_py,)
-
b.R Version.
-
a.
color_mapping <- map_graph(
cluster_distance = spatial_dis_R,
color_distance = color_distance_R,
)
Note: In both Python and R version, the parameters and configuration are described as follows:
cluster_distance: A DataFrame with unique cluster names as index and columns, and values as the spatial interlacement scores between clusters. This can be the output of Part 2. Ensure this DataFrame accurately reflects the spatial relationships between clusters in your data.
color_distance: A DataFrame with unique colors (in hex-code) as index and columns, and values representing the perceptual differences (contrasts) between these colors. This can be the output of Part 3.
random_seed: An integer as random seed for optimization. Defaults to 123.
distance_metric: Enumeration value chosen from: "euclidean", "manhattan", "log", or "mul_1". Defaults to "mul_1", which maximizes the Hadamard Product between the two matrices.
random_max_iter: An integer setting the maximum number of iterations for the optimization. Defaults to 5000.
verbose: A logical value controlling the output of intermediate information during the mapping process. Set to 'True' for detailed output. Defaults to 'False'.
Note: The return of this process is an optimized mapping dictionary where the keys are cluster names and the values are the corresponding hex colors (Figures 8A and 8B).
Figure 8.
Screenshots of the optimized cluster color mapping using the `map_graph()` function
(A and B) The cluster color mapping of the selected color palette to the given set of unique cell clusters, using the `map_graph()` function in Python (A) and R (B). Each category is assigned a specific color from the palette.
Optional: To allow a convenient integration into daily analysis, we introduce a wrapper function, the `colorize()` function to perform all steps from Part 2 to Part 4. The parameters have the same usage as specified in the step by step instructions above.
# Python scripts
# Get optimized cluster-color assignment with Spaco (Figure 9A)
color_mapping = spaco.colorize(
cell_coordinates = adata.obsm['spatial'],
cell_labels = adata.obs['annotation'],
colorblind_type = "none",
radius = 200,
n_neighbors = 30,
palette = palette_default,
)
# Get optimized palette and assignment with Spaco (Figure 9B)
color_mapping = spaco.colorize(
cell_coordinates = adata.obsm['spatial'],
cell_labels = adata.obs['annotation'],
colorblind_type = "none",
radius = 200,
n_neighbors = 30,
palette = None,
)
#R scripts.
# Get optimized cluster-color assignment with SpacoR (Figure 9D)
color_mapping <- colorize(
cell_coordinates = coor,
cell_labels = brain$seurat_clusters,
colorblind_type = "none",
radius = 20,
n_neighbors = 50,
palette = palette_default,
)
# Get optimized palette and assignment with SpacoR (Figure 9E)
color_mapping <- colorize(
cell_coordinates = coor,
cell_labels = brain$seurat_clusters,
colorblind_type = "none",
radius = 20,
n_neighbors = 50,
)
Part 5: Visualizing the cellular cartograph of spatial transcriptomics dataset with optimized color mapping
Timing: <1 min
After obtaining the optimized color mapping, you can use third party packages, like Scanpy23 and Seurat,34 to visualize the cellular cartograph (Figures 9A–9E).
-
8.Plot the cellular cartograph using your analytic framework.
-
a.Python Version.# Convert the color mapping 'Dict' to adapt Scanpy plotting functioncolor_mapping = {k: color_mapping[k] for k in adata.obs['annotation'].cat.categories}palette_spaco = list(color_mapping.values())# Visualize the cell type cartographsc.pl.spatial(adata, color="annotation", spot_size=0.035, palette=palette_spaco)
-
b.R Version.
-
a.
SpatialDimPlot(
brain,
label = TRUE,
cols = color_mapping,
label.size = 7,
stroke=NA,
)
Figure 9.
Screenshots of the cellular cartograph visualization using optimized colorization
(A‒C) For the Python version. Optimized colorization of the seqFISH dataset using palettes from three options of Part 3: (A) manually defined, Figure reprinted with permission from Jing et al. 2024.1 (B) automatically generated by `embed_graph()`, Figure reprinted with permission from Jing et al. 2024.1 (C) extracted from image by `extract_palette()`.
(D and E) For the R version. Optimized colorization of the Visium dataset using palettes from two options of Part 3: (D) manually defined, (E) automatically generated by `embed_graph()`.
Expected outcomes
The Spaco protocol is designed to enhance the categorical visualization clarity of spatial transcriptomics data.
Upon completion of protocol, we expect several key outcomes: (1) Optimized cluster-color assignments: the cluster-color mapping from Part 4, which indicates the colorization for the spatial transcriptomics dataset provided by the user (Figure 8), (2) spatial cellular cartography plots with enhanced visual contrast to clearly explore the distribution of different cell types (Figure 9).
We also expect several intermediate outputs which could be useful: (1) The calculated spatial interlacement scores between clusters (Figure 3), (2) automatically extracted palette available for other plots (Figure 5).
Limitations
A potential drawback of Spaco lies in its strict adherence to a correspondence between color contrast and the spatial neighboring relationship of cell types. However, there may be users who prefer to employ non-contrastive colors for cell subtypes, irrespective of their spatial proximity, or where they aim to designate colors for specific cell types in line with existing literature.
Additionally, the effectiveness of color-based visualization is limited when dealing with more than hundreds of clusters. Hence, leveraging finely crafted palettes and employing more sophisticated techniques such as interactive plots and virtual reality can be beneficial.
Furthermore, while Spaco’s protocol is theoretically able to process three-dimensional spatial data points or even high-dimensional embeddings, precise parameter tuning is essential to differentiate between "sparse" and "dense" clusters for efficient colorization, which necessitates user experimentation depending on the provided data.
Troubleshooting
Problem 1
When performing “3. Install the Spaco package” Before you begin, the 'pip' or 'devtools' cannot directly download the Spaco package from GitHub due to internet restrictions, like from a cloud service or in a docker container.
Potential solution
Manually download and install. You can manually download the package source code from the GitHub repository as a zip file, and then upload the file to your environment and install it using the 'pip' function in Python or the `install.packages()` function in R.
Problem 2
Some dependent packages are not installed automatically when in “3. Install the Spaco package” Before you begin.
Potential solution
Ensure to use a clean environment. As the common use case of Spaco is to integrate into other analytic workflows, you can include essential packages for your analysis in the environment, but should avoid including too many packages which could mess up the dependency.
Ensure the installation is performed correctly as outlined in "before you begin".
Problem 3
Automatically generated palette using `embed_graph()` has insufficient visual contrast, when doing “Part 3: 5-b. Automatic palette selection based on cluster interlacement using `embed_graph()`”.
Potential solution
This could be probably caused by outliers in the cluster spatial interlacement matrix from Part 2. Adjust the parameters of `spatial_distance()` following the description. Another potential solution is to try setting the `log_colors` of `embed_graph()` to 'True'.
Problem 4
Unexpected error when using the extract_palette() function, in “Part 3: 5-c. Extract thematic colors from an image using `extract_palette()`”.
Potential solution
Check and ensure the provided images are in three-channel RGB format. Downloaded images could sometimes include a transparency channel, making it a four-channel RGBA format. You can slice the matrix to make a three-channel RGB image.
Problem 5
Colors are badly assigned to clusters, resulting in poor visual clarity in the visualization plots, when performing “Part 5: 8. Plot the cellular cartograph using your analytic framework”.
Potential solution
Most probably, you should check if the cluster spatial interlacement matrix is properly calculated in Part 2. Concretely, you can first check if the parameters, especially `radius` and `n_neighbors` in the `spatial_distance()` function are set properly following the parameter description. Make sure the parameters meet the spatial scale of your dataset. If the problem still exists, pick several cluster pairs that look most interlaced, and check if their DOI scores surpass the others. Adjust the parameters following the description until the DOI scores correspond to their spatial interlacement.
Resource availability
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Yinqi Bai (baiyinqi@genomics.cn).
Technical contact
Further information and requests for technical details and issues should be directed to and will be fulfilled by the technical contact, Zehua Jing (jingzehua20@mails.ucas.ac.cn).
Materials availability
This study did not generate new unique reagents.
Data and code availability
Spaco and SpacoR are available at https://github.com/BrainStOrmics/Spaco and https://github.com/BrainStOrmics/SpacoR. The code generated during this study is available at GitHub: https://github.com/BrainStOrmics/Spaco_scripts. The specific version v0.2.0 in this protocol for the spaco (Python) and SpacoR package is also archived in Zenodo: https://zenodo.org/doi/10.5281/zenodo.10113359.35
Acknowledgments
The authors would like to acknowledge the technical support provided by China National Gene Bank. The graphical abstract was created with BioRender.com.
Author contributions
Software, Z.J. and B.Y.; formal analysis, B.Y.; writing – original draft, Z.J. and B.Y.; writing – review and editing, Z.J. and Y.B.; visualization, B.Y.; supervision, Z.J. and Y.B.
Declaration of interests
The authors declare no competing interests.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used ChatGPT 3.5 (GPT-3.5) in order to facilitate the process of proofreading the contents of our draft. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Contributor Information
Zehua Jing, Email: jingzehua20@mails.ucas.ac.cn.
Yinqi Bai, Email: baiyinqi@genomics.cn.
References
- 1.Jing Z., Zhu Q., Li L., Xie Y., Wu X., Fang Q., Yang B., Dai B., Xu X., Pan H., Bai Y. Spaco: A comprehensive tool for coloring spatial data at single-cell resolution. Patterns (N. Y.) 2024;5 doi: 10.1016/j.patter.2023.100915. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Zheng Q., Lu M., Wu S., Hu R., Lanir J., Huang H. Image-guided color mapping for categorical data visualization. Comput. Vis. Media (Beijing) 2022;8:613–629. doi: 10.1007/s41095-021-0258-0. [DOI] [Google Scholar]
- 3.Crameri F., Shephard G.E., Heron P.J. The misuse of colour in science communication. Nat. Commun. 2020;11:5444. doi: 10.1038/s41467-020-19160-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Hou W., Ji Z. Palo: spatially aware color palette optimization for single-cell and spatial data. Bioinformatics. 2022;38:3654–3656. doi: 10.1093/bioinformatics/btac368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hunter J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007;9:90–95. doi: 10.1109/mcse.2007.55. [DOI] [Google Scholar]
- 6.Wickham H. Ggplot2. WIREs Computational Stats. 2011;3:180–185. doi: 10.1002/wics.147. [DOI] [Google Scholar]
- 7.Liu B., Li Y., Zhang L. Analysis and Visualization of Spatial Transcriptomic Data. Front. Genet. 2021;12 doi: 10.3389/fgene.2021.785290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rao N., Clark S., Habern O. Bridging Genomics and Tissue Pathology. Genet. Eng. Biotechnol. News. 2020;40:50–51. doi: 10.1089/gen.40.02.16. [DOI] [Google Scholar]
- 9.Wang X., Allen W.E., Wright M.A., Sylwestrak E.L., Samusik N., Vesuna S., Evans K., Liu C., Ramakrishnan C., Liu J., et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science. 2018;361 doi: 10.1126/science.aat5691. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chen K.H., Boettiger A.N., Moffitt J.R., Wang S., Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348 doi: 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lubeck E., Cai L. Single-cell systems biology by super-resolution imaging and combinatorial labeling. Nat. Methods. 2012;9:743–748. doi: 10.1038/nmeth.2069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rodriques S.G., Stickels R.R., Goeva A., Martin C.A., Murray E., Vanderburg C.R., Welch J., Chen L.M., Chen F., Macosko E.Z. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363:1463–1467. doi: 10.1126/science.aaw1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W.M., Hao Y., Stoeckius M., Smibert P., Satija R. Comprehensive Integration of Single-Cell Data. Cell. 2019;177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Dries R., Zhu Q., Dong R., Eng C.-H.L., Li H., Liu K., Fu Y., Zhao T., Sarkar A., Bao F., et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 2021;22:78. doi: 10.1186/s13059-021-02286-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Palla G., Spitzer H., Klein M., Fischer D., Schaar A.C., Kuemmerle L.B., Rybakov S., Ibarra I.L., Holmberg O., Virshup I., et al. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods. 2022;19:171–178. doi: 10.1038/s41592-021-01358-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Qiu X., Zhu D.Y., Yao J., Jing Z., Zuo L., Wang M., Min K.H.J., Pan H., Wang S., Liao S., et al. Spateo: multidimensional spatiotemporal modeling of single-cell spatial transcriptomics. bioRxiv. 2022 doi: 10.1101/2022.12.07.519417. [DOI] [Google Scholar]
- 17.Lohoff T., Ghazanfar S., Missarova A., Koulena N., Pierson N., Griffiths J.A., Bardot E.S., Eng C.-H.L., Tyser R.C.V., Argelaguet R., et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat. Biotechnol. 2022;40:74–85. doi: 10.1038/s41587-021-01006-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Harris C.R., Millman K.J., van der Walt S.J., Gommers R., Virtanen P., Cournapeau D., Wieser E., Taylor J., Berg S., Smith N.J., et al. Array programming with NumPy. Nature. 2020;585:357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McKinney W. In: Proceedings of the 9th Python in Science Conference. van der Walt S., Millman J., editors. 2010. Data Structures for Statistical Computing in Python; pp. 56–61. [DOI] [Google Scholar]
- 20.Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J., et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020;17:261–272. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Virshup I., Rybakov S., Theis F.J., Angerer P., Wolf F.A. anndata: Annotated data. bioRxiv. 2021 doi: 10.1101/2021.12.16.473007. [DOI] [Google Scholar]
- 22.Waskom M. seaborn: statistical data visualization. J. Open Source Softw. 2021;6:3021. doi: 10.21105/joss.03021. [DOI] [Google Scholar]
- 23.Wolf F.A., Angerer P., Theis F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018;36:411–420. doi: 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zeileis A., Fisher J.C., Hornik K., Ihaka R., McWhite C.D., Murrell P., Stauffer R., Wilke C.O. Colorspace: A toolbox for manipulating and assessing colors and palettes. arXiv. 2019 doi: 10.48550/ARXIV.1903.06490. [DOI] [Google Scholar]
- 26.Sabry F. K Nearest Neighbor Algorithm: Fundamentals and Applications. One Billion Knowledgeable; 2023. [Google Scholar]
- 27.Lohoff T., Ghazanfar S., Missarova A., Koulena N., Pierson N., Griffiths J.A., Bardot E.S., Eng C.-H.L., Tyser R.C.V., Argelaguet R., et al. Highly multiplexed spatially resolved gene expression profiling of mouse organogenesis. bioRxiv. 2020 doi: 10.1101/2020.11.20.391896. [DOI] [Google Scholar]
- 28.Borm L.E., Mossi Albiach A., Mannens C.C.A., Janusauskas J., Özgün C., Fernández-García D., Hodge R., Castillo F., Hedin C.R.H., Villablanca E.J., et al. Scalable in situ single-cell profiling by electrophoretic capture of mRNA using EEL FISH. Nat. Biotechnol. 2023;41:222–231. doi: 10.1038/s41587-022-01455-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Brewer C.A. In: Visualization in Modern Cartography Modern Cartography. Maceachren A.M., Fraser Taylor D.R., editors. Elsevier; 1994. Color use guidelines for mapping and visualization; pp. 123–147. [DOI] [Google Scholar]
- 30.Jayprkash A., Vijay R. Compression of MR images using DWT by comparing RGB and YCbCr color spaces. J. Sign. Inf. Process. 2013;04:364–369. doi: 10.4236/jsip.2013.44046. [DOI] [Google Scholar]
- 31.McInnes L., Healy J., Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv. 2018 doi: 10.48550/ARXIV.1802.03426. [DOI] [Google Scholar]
- 32.Kruskal J.B., Wish M. SAGE; 1978. Multidimensional Scaling. [Google Scholar]
- 33.Riemersma, T. Colour metric. https://www.compuphase.com/cmetric.htm.
- 34.Satija R., Farrell J.A., Gennert D., Schier A.F., Regev A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 2015;33:495–502. doi: 10.1038/nbt.3192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Jinerhal, and zhuqianhua (2023). BrainStOrmics/Spaco: Spaco 0.2.0 (Zenodo). 10.5281/ZENODO.10113347. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Spaco and SpacoR are available at https://github.com/BrainStOrmics/Spaco and https://github.com/BrainStOrmics/SpacoR. The code generated during this study is available at GitHub: https://github.com/BrainStOrmics/Spaco_scripts. The specific version v0.2.0 in this protocol for the spaco (Python) and SpacoR package is also archived in Zenodo: https://zenodo.org/doi/10.5281/zenodo.10113359.35

Timing: ∼5 min

CRITICAL: Parameters and configuration. `radius`, `n_neighbors`, `n_cells` can significantly affect the performance of this protocol. You should carefully tune these parameters following the description below.





