Abstract
Spatial proteomics can visualize and quantify protein expression profiles within tissues at single-cell resolution. Although spatial proteomics can only detect a limited number of proteins compared to spatial transcriptomics, it provides comprehensive spatial information with single-cell resolution. By studying the spatial distribution of cells, we can clearly obtain the spatial context within tissues at multiple scales. Spatial context includes the spatial composition of cell types, the distribution of functional structures, and the spatial communication between functional regions, all of which are crucial for the patterns of cellular distribution. Here, we constructed a comprehensive spatial proteomics functional annotation knowledgebase, scProAtlas (https://relab.xidian.edu.cn/scProAtlas/#/), which is designed to help users comprehensively understand the spatial context within different tissue types at single-cell resolution and across multiple scales. scProAtlas contains multiple modules, including neighborhood analysis, proximity analysis and neighborhood network, to comprehensively construct spatial cell maps of tissues and multi-modal integration, spatial gene identification, cell-cell interaction and spatial pathway analysis to display spatial variable genes. scProAtlas includes data from eight spatial protein imaging techniques across 15 tissues and provides detailed functional annotation information for 17 468 394 cells from 945 region of interests. The aim of scProAtlas is to offer a new insight into the spatial structure of various tissues and provides detailed spatial functional annotation.
Graphical Abstract
Graphical Abstract.
Introduction
Tissues are typically complex assemblies composed of various cell types with specific interactions (1). Locating these cells within the tissue and identifying their corresponding cell types is crucial for understanding the intercellular interactions and the biological functions associated with these cellular assemblies within the tissue. Spatial proteomics allows for precise localization of cells at single-cell resolution and distinguishes cell types by measuring specific proteins. This approach is crucial for determining the functional regions within tissues and understanding the spatial communication between cells.
Currently, various spatial proteomics imaging techniques have been developed. For example, mass spectrometry-based imaging techniques include imaging mass cytometry (IMC) (2) and uses metal isotopes to label antibodies for staining; multiplexed ion beam imaging by time-of-flight (MIBI-TOF) (3) is based on ion beam and TOF mass spectrometry imaging; fluorescence-based cyclic imaging technologies include cyclic immunofluorescence (CyCIF) (4), co-detection by indexing (CODEX) (5) and multi-epitope ligand cartography (MELC) (6) can detect multiple target proteins through multiple rounds of fluorescent staining, enabling the measurement of a large number of cells with their marker proteins. Several techniques can combine the spatial proteomics imaging with transcriptomics, such as CosMx (7) and spatial CITE-seq (8), combine spatial proteomics imaging with transcriptomics, offering both transcriptomic and spatial proteomic perspectives, which play a crucial role in understanding cellular heterogeneity within tissues. Additionally, chip-based high-throughput imaging technologies, such as chip cytometry (9), utilize microarray chips to perform multiple rounds of fluorescent labeling and corresponding staining.
With the development of spatial proteomics techniques, many publicly available spatial proteomics datasets have been generated, allowing researchers to conduct further studies. Currently, there are two databases that involve spatial proteomics, Aquila (10) and SODB (11). Aquila is a spatial omics database that offers interactive online analysis and visualization, covering a total of 110 spatial transcriptomics and proteomics datasets. Similarly, SODB includes 132 spatial transcriptomics and proteomics datasets and provides spatial cell type interaction analysis through an interactive visualization tool called SOView. These two methods offer online analysis capabilities for important spatial functions such as spatial colocalization, cell-cell spatial communication and spatial pattern gene identification. However, they do not provide a systematic summary of the functional roles of spatially aggregated cells or the interactions between different spatial regions.
To this end, we constructed scProAtlas, a database designed to provide detailed functional annotations for spatial proteomics from multiple perspectives. scProAtlas hosts datasets from 17 468 394 cells in 15 human tissues with 8 different spatial proteomics imaging techniques (Figure 1A). scProAtlas includes several key modules (Figure 1 B, C). Multi-modal integration combines matched scRNA-seq with spatial proteomics data, improving the low throughput limitation of spatial proteomics by expanding the analysis to a gene-level perspective. Neighborhood analysis module provides spatial visualization for cell types and neighborhood annotations, summarizing cell aggregation and corresponding functions in local regions. Neighborhood network module reveals spatial communication relationships between different local regions. Spatial enrichment analyzes the colocalization of cell types or neighborhoods based on spatial proximity. Spatial pattern gene identification module provides genes that exhibit spatial expression patterns in the predicted gene expressions from spatial proteomics data. Finally, cell–cell interaction analysis identifies the interactions between ligand–receptor pairs of the genes with spatial expression patterns to study the spatially patterned cell communication. Annotations for all datasets can be found in the online website (https://relab.xidian.edu.cn/scProAtlas/#/). scProAtlas provides multi-scale functional annotation information, from local regions to individual cells, and extends to the molecular level with tissues.
Figure 1.
Overview of scProAtlas. (A). Public resources and tissues used in scProAtlas. (B). Basic function of scProAtlas. scProAtlas supports browsing, downloading and searching. (C). Analysis module in scProAtlas.
Methods
Data collection and preprocessing
We conducted a comprehensive search of all literature using the keyword ‘spatial proteomics’, ‘spatial protein analysis’, ‘spatial protein imaging’ and ‘scRNA-seq’ through PubMed and Google Scholar. Through manual filtering, we identified and cataloged eight distinct spatial protein imaging techniques including CODEX (5), IMC (2), MIBI-TOF (3), CyCIF (4), GeoMx/CoxMx (7), Spatial CITE-seq (8), Chip Cytometry (9) and MELC (6). The data sources encompass various large-scale data consortiums and multiple literatures. The collected datasets comprise various formats such as raw image data and expression matrices. To address the varying formats of different datasets, we standardized each dataset according to its specific characteristics.
For datasets containing raw image data or data with corresponding image masks, we utilized the DeepCell Mesmer model for preprocessing (12). This step includes the complete workflow of Mesmer for whole-cell segmentation of imaging data. Each image's corresponding mask was obtained using Mesmer. Subsequently, we used the ‘regionprops’ function from Python scikit-image library (13) to extract the expression matrix for each ROI based on the mask and the original image. We extracted the expression matrix and the spatial coordinate information from intensity measurement result and input them into ‘anndata’ format (14).
For datasets that provide expression profiles, including the expression matrices, spatial coordinate information and cell type information were standardized into the ‘anndata’ format.
After standardizing the formats, we processed the expression matrices using the Python library ‘scanpy’ (15). We filtered out proteins expressed in fewer than 200 cells and cells expressed fewer than three proteins. The expression matrices were then normalized using the log transformation.
Spatial proteomics and scRNA-seq integration
scProAtlas used MaxFuse (16) for integration of spatial proteomics and scRNA-seq data. MaxFuse can match and integrate unpaired spatial proteomics and scRNA-seq datasets by identifying the most similar pairs between the modalities based on shared and unique features between cells. This approach allows us to find the optimal matches between the different data types. For each spatial proteomics dataset, we searched for well-annotated scRNA-seq data from the same tissue type. Before using MaxFuse, we harmonized the scRNA-seq and spatial proteomics data by obtaining the mappings between protein names and gene names from the STRING database (17). Then we extracted a subset of genes/proteins with shared names from both modalities and used this subset as the input for MaxFuse. MaxFuse can match the scRNA-seq data for each ROI and generate similarity indices between the two modalities. Based on these indices, we projected the cell type labels and expression levels from the scRNA-seq data to the spatial cells. For each ROI, MaxFuse generates the predicted cell type labels and expression levels of a certain number of genes based on the collected scRNA-seq data.
Neighborhood analysis
scProAtlas performed neighborhood analysis for each dataset, following the pipeline established in previous studies (18–20). The neighborhood identification process can be briefly summarized as follows: for each cell in the tissue, we defined a window centered on the cell within a specified range and calculated the proportion of cell types within each window as input vectors for neighborhood identification. We used k-means clustering on the cell type proportion vectors to identify clusters, reflecting the average cell type composition of each spatial cluster. Each cluster was manually annotated based on prior knowledge. This annotation allowed us to identify regions within the tissue that comprise specific cell type assemblages with distinct physiological structures and functions. The neighborhood labels from manual annotation reflected the composition and function of these regions.
In the original study (18,19), the window sizes typically ranged from 5, 10, to 15 cells. To maintain consistency with other studies, we set the window size for neighborhood identification to a radius encompassing 10 cells for each dataset. The number of k-means clusters was uniformly set to 10 across all datasets.
Neighborhood network
To map the spatial communication relationships between neighborhood components, we performed neighborhood network analysis. This process involved three steps: firstly, we constructed a spatial graph for each type of neighborhood for each ROI. The cells with a degree of connectivity less than five within the neighborhood label will be deleted. Then, we identified the connected components within each neighborhood label using the Python library ‘scipy’ (scipy.sparse.csgraph.connected_components function) (21). Connected components are defined as subgraphs where any two nodes are connected by a path and there are no connections to other nodes in the graph. These components can be used to identify different cell groups of the same neighborhood type within the spatial context. Finally, we combined all neighborhood subgraphs to form a comprehensive spatial communication network that encapsulates the spatial interactions compressed by neighborhood labels.
Proximity analysis
scProAtlas conducted spatial proximity analysis to examine the spatial enrichment of cell types and neighborhoods within each ROI. Spatial enrichment was determined by counting specified label pairs, such as cell type and neighborhood, for categories
and
and measuring the distances between cells, denoted as
. Using a randomization test to compare the observed counts and distances with those obtained from randomly shuffling the label pairs. The random permutation maintained the connectivity while shuffling the labels and the number of recovered nodes was calculated in each iteration. This estimation can be represented as:
![]() |
Where
and
represents the means and standard deviations for each label pairs and
represents the z score. The z score indicates whether a cluster pair is significantly overpresented or underrepresented for interactions between nodes in the connectivity graph. This function was implemented using the Python Library ‘Squidpy’ (22) with the function of squidpy.gr.nhood_enrichment.
Spatial variable gene identification
scProAtlas calculated the spatial autocorrelation for each gene within each integrated ROI. Spatial autocorrelation measures the correlation of a variable with itself through space, reflecting the similarity of data values at neighboring locations. Given the spatial coordinates and feature vectors in a ROI, it can determine whether the feature vector is clustered, dispersed or randomly distributed. In this analysis, we focused on genes to assess whether each gene exhibited a spatial distribution pattern within the ROI. We used Moran's I (23) as the statistical method to identify spatial patterns. Moran's I is a measure of spatial autocorrelation, which helps us evaluate if a gene's expression is spatially clustered or randomly dispersed within the ROI. Here we calculated Moran's I score for all genes in all ROIs. Spatial autocorrelation can be represented by the formula:
![]() |
Where
is the number of cells,
is the value of gene expression in location
,
is the mean expression of the specific gene in all cells,
is the spatial weight between locations
and
,
is the sum of all spatial weights. Here we used the Python Library ‘Squidpy’ to calculate Moran's I score and the P-value from permutation tests, utilizing the function ‘squidpy.gr.spatial_autocorr’. Significant spatial pattern genes were selected using the criteria of P value < 0.05.
Cell–cell interaction
Understanding the biological processes mediated by ligand–receptor interactions between cells is crucial for deciphering the crosstalk between different cell types or neighborhoods within tissues. Thus, we identified cell-cell interactions in each ROI by using CellPhoneDB (24). The input is the high spatial variable genes with the top 10% ranked Moran's I score. All the significant interactions were under the significance level P value < 0.05.
Spatial pathway
Gene set enrichment analysis for each ROI was performed by using the Python library ‘Enrichr’ (25–27). The input gene sets were derived from the top 10% of genes based on their Moran's I scores. The threshold of significant spatial pathways was set as P< 0.05.
Differential expression gene identification
We used Scanpy's (15) standard differential gene expression analysis workflow to identify differentially expressed genes (DEGs) in both cell type and neighborhood groupings. For each sample group, we filtered the DEGs based on P-values for each cell type/neighborhood. After filtering out genes with P-values greater than 0.001, we concatenated the results for all cell types/neighborhoods with the metadata of the data to form the content of this module.
Results
Overview of scProAtlas
In total, scProAtlas collected data from eight spatial protein imaging techniques, including CODEX (5), IMC (2), MIBI-TOF (3), CyCIF (4), GeoMx/CoxMx (7), Spatial CITE-seq (8), Chip Cytometry (9) and MELC (6). scProAtlas encompasses 15 human tissue types, incorporating data from 17 468 394 cells across 945 regions (Figure 2A, B). The original protein data includes a total of 345 protein channels (Supplementary Table S5). All data were categorized into different datasets based on the tissues and studies and assigned corresponding scProAtlas IDs. The database comprises 40 datasets in total, including 5 CODEX, 11 MIBI-TOF, 10 IMC, 8 CyCIF, 3 Spatial CITE-seq, 1 MELC, 1 CosMx and 1 Chip Cytometry dataset. Detailed statistics for all datasets are provided in Supplementary Tables S1 and S6. scProAtlas offers the following features for spatial proteomics data: integration of scRNA-seq and spatial proteomics data, cell type visualization, marker protein visualization, neighborhood identification, neighborhood network analysis, spatial proximity analysis, spatial variable gene identification, cell–cell interaction and spatial pathway analysis (Figure 1). scProAtlas aims to provide a comprehensive multiscale structure map of various human tissues, from cells to neighborhood structures and down to genes. Although spatial proteomics can obtain spatial information at single-cell resolution, it is limited to focusing on the expression levels of a few dozen proteins within tissues. Therefore, some methods have been developed to predict gene expression in spatial proteomics based on shared features between scRNA-seq and spatial proteomics (16,28,29). These approaches extend spatial proteomics to include the expression of a wide range of genes. In the integration of scRNA-seq and spatial proteomics, we used a total of 15 scRNA-seq datasets corresponding to the same tissues. This process involved 23 459 genes and encompassed 565 cell types. Most tissues commonly shared main cell types such as endothelial cells, T and B cells, epithelial cells, fibroblasts and stromal cells, cell type list for all datasets is provided in Supplementary Table S2 (Figure 2C). In a tissue, different cell types play various roles, and their spatial aggregation can form functional spatial cell clusters. We used neighborhood analysis to capture these groups of cells that gather within a specific range and annotated them. In the identification of neighborhoods module, 565 cell types were used for neighborhood analysis, resulting in 679 neighborhoods being manually annotated to display the functional and anatomical structures within each region (Figure 2D). The neighborhood list for all datasets is provided in Supplementary Table S3. In the identification of spatially variable genes module, a total of 2104 genes across all datasets were identified as significant spatial pattern genes with Moran's I score > 0.1 (Figure 2E). The statistics for the score ranges corresponding to each gene with a Moran's I score greater than 0 (Supplementary Table S4). scProAtlas provides the visualization of marker proteins and spatial pattern genes in each cell type and each neighborhood. Through these two modules, we can examine the spatial distribution of genes with spatial expression patterns, as well as their spatial enrichment within specific cell types and neighborhoods. This approach clarifies the spatial distribution relationships between genes, cells and anatomical structures within the tissues. In the cell–cell interaction analysis module, we identified a total of 1324 ligands and receptors by using spatial pattern genes. Additionally, we also identified 263 spatial pathways. These modules provide a comprehensive functional annotation for genes and proteins with spatial pattern at single-cell resolution. The design and functionalities of scProAtlas are illustrated in Figure 1.
Figure 2.
Data statistics in scProAtlas. (A). Region of interest (ROI) statistics display the number of ROIs within each tissue for every technique. Additionally, for each technique, examples of cell type distribution within scProAtlas are provided. (B). Distribution of the proportion of cells used in the analysis is shown across different techniques. (C). Distribution of the proportion of cell type used in the analysis is shown across different techniques. Subtypes of the same cell type are summarized and grouped into a single category for statistical purposes. (D). The proportion of neighborhood types across different techniques. (E). The proportion of spatial pattern genes in each imaging technique is categorized by Moran's I score: less than 0.02, 0.02 to 0.05, 0.05 to 0.1, and greater than 0.1, showing the distribution across different techniques.
For example, previous studies have shown that MZB1 exhibits high expression levels in lymph nodes, thymus, colon and kidney, and MZB1 expression is shown as a marker of plasma cells (30–32). In scProAtlas, we found that MZB1 exhibited high spatial pattern scores across multiple tissues (Figure 3A), indicating a consistent spatial enrichment pattern in these tissues we mentioned before. Tissues exhibit high MZB1 spatial expression pattern included the spleen, thymus, lymph node, colon and kidney, with findings in all but the spleen consistent with previous research. In these tissues, MZB1 was enriched in plasma cells within the spleen thymus, lymph node (Figure 3B). In the thymus, its distribution was concentrated in plasma and naive B cells (Supplementary Figures S1, S2). However, in other tissues, MZB1 did not show a significant spatial pattern. Previous study (31) has shown that MZB1 plays a crucial role in the biological processes it regulates, particularly in plasma cell differentiation. A lack of MZB1 leads to abnormal plasma cell function, including reduced antibody production, which has been confirmed in multiple tissues, including the spleen and lymph nodes.
Figure 3.
Example from analysis modules of scProAtlas. (A). Frequency of MZB1’s Moran's I score appearing in the top 10 across various tissues. (B). Comparison of MZB1’s spatial distribution and cell types in spleen, thymus and lymph node samples. The top row shows the spatial distribution of plasma cells in spleen, thymus and lymph node, respectively. The bottom row displays the distribution of MZB1 expression in spleen, thymus, and lymph node, with lighter colors indicating higher expression levels of MZB1 in those cells. (C). Example of cell type and neighborhood distribution in the large intestine using CODEX data. (D). First column is the spatial distribution of enterocyte of epithelium in CODEX large intestine data and TSPAN8 expression in the same sample. Second column is the spatial distribution of naïve B cells in CyCIF tonsil data and IGHD expression in the same sample. Lighter colors indicating higher expression levels of genes. (E). Spatial co-localization of CODEX large intestine data. Top heatmap is the co-localization of cell types, bottom is the co-localization of neighborhoods. Darker colors indicating higher co-localization score. (F). Neighborhood network of the same sample in Figure 3C, in the network diagram correspond to the neighborhood labels shown in Figure 3C. (G). Cell–cell interactions in CODEX large intestine Reg001 data.
Browse by spatial proteomics annotation module
Multi-modal integration
Spatial protein imaging is an advanced technique that can visualize and quantify protein expression within tissue architecture while preserving spatial context. Spatial proteomics enables detailed mapping of protein localization and interactions, providing insights into cellular function and organization in complex tissues. While spatial proteomics can capture the spatial context of tissues at single-cell resolution, it can only display the distribution of a limited number of proteins. This limitation makes it difficult to analyze detailed biological processes within the tissue. Multi-modal integration is a method that can combine scRNA-seq and spatial proteomics, integrate transcriptomic expression profiles and corresponding cell type information into spatial proteomics data or integrate spatial information into scRNA-seq data. Therefore, we used MaxFuse (16) to annotate cell phenotypes in spatial protein data and extend spatial protein information to the gene level. We collected corresponding well-annotated scRNA-seq dataset (33) for each spatial proteomics dataset from the same tissue type. The integration between scRNA-seq and spatial proteomics was performed following the standard MaxFuse pipeline. In this module, scProAtlas can provide integrated information for each dataset. The integration results of cell types and neighborhoods for the entire dataset are displayed on the webpage.
Neighborhood analysis
Tissues are composed of multiple cell communities, which can form functional clusters with complex spatial arrangements to support organ homeostasis and function (19). Therefore, analyzing the spatial context within tissues is crucial for understanding tissue structure. By performing neighborhood analysis, we can identify patterned clusters of cell types within local regions, thereby providing functional structural annotations within the tissue. Understanding these spatial relationships provides insights into how cells cooperate within their microenvironments to maintain overall tissue health and respond to physiological changes. Therefore, in this module, scProAtlas provides multiscale annotation information from cell types to neighborhoods, to offer insights into the phenotypes of individual cells and their corresponding functional states. To do this, we annotated the corresponding cell types and neighborhood labels for a total of 945 ROIs. The cell type annotations were derived from previously collected scRNA-seq results from original studies, with MaxFuse used to transfer cell type labels from scRNA-seq to obtain accurate cell type annotations. For each dataset, we summarized the cell types to identify clusters that represent functional states within the tissue or collections of several cell types, resulting in neighborhood labels that reflect these characteristics. In total, scProAtlas includes 845 types of cell subtypes and 679 types of neighborhoods. The neighborhood annotations for each dataset are displayed as heatmaps at the bottom of the page in this module. Taking Reg009 sample of the large intestine from the SCP_CODEX1 dataset as an example, after cell type annotation and integration, a total of 16 cell types and 7 neighborhoods were identified. The neighborhoods in this dataset are annotated based on the original publication, as shown in Figure 2C. Users can browse all the cell type and neighborhood annotation by using Neighborhood analysis in scProAtlas webpage. These results provide a comprehensive view of the spatial distribution of cell types, and the enrichment of cell types and their corresponding functions in specific spatial regions.
Neighborhood network
Different cell types can combine to form functional neighborhoods, these cellular neighborhoods can also assemble in a regular manner, recruiting each other to establish spatial communication (18,19). Here, we used the cellular neighborhood network to examine the commonalities and differences in communication relationships between cellular neighborhoods in different regions. We identified the connected components of each neighborhood throughout the entire tissue to organize the enriched regions of the same neighborhood into clusters. We filtered out the clusters with too few cells. Then, we used the coordinate center points of each remaining cluster to map the network between clusters within the same neighborhood that are enriched in the same area. This approach illustrates long-distance spatial communication between different neighborhoods. Taking Reg009 from large intestine as an example (Figure 3, Supplementary Figures S3,S4). In this region, consistent with prior studies, we can observe epithelium enterocytes and enteroendocrine cells are enriched in mucosa area, surrounded by plasma and immune region. Previous studies (31) have validated this observation, showing that in the intestine, gut microbiota influences the function of immune cells, including plasma cells, by modulating mucosal and immune responses, thereby maintaining intestinal barrier function.
Spatial variable gene identification
In spatial omics research, in addition to studying the spatial distribution of cells, it is equally important to examine the spatial expression patterns of genes. Here, based on the integration of scRNA-seq and spatial proteomics data, we calculate the spatial autocorrelation of the genes to reflect the spatial pattern of all genes within each ROI. We used Moran's I score (23) as the spatial autocorrelation metrics for each gene across 945 ROIs from 15 tissues. scProAtlas webpage shows the spatial enrichment of all genes by dataset and the expression levels of the top 10 genes within each ROI. By combining the neighborhood annotation, users can query the spatial distribution of genes in each ROI in the database. This allows users to compare the cell type attributes of cells at locations with high gene expression and determine which cell-enriched or functional regions these genes are associated with. For example, Tetraspanin 8 (TSPAN8) (34) was found to be among the top 10 Moran's I score in numerous large intestine samples, indicating that TSPAN8 commonly exhibits extensive spatial communities in the human large intestine. Previous studies have shown that TSPAN8 is highly expressed in the human gut (31,35). Tetraspanins, the protein encoded by TSPAN8, play a protective role in intestinal epithelial cells by maintaining tight junctions between cells, thereby enhancing intestinal barrier function (36). This regulation helps prevent excessive immune responses and intestinal damage. We then examined the distribution of TSPAN8 in cell types and neighborhoods to investigate its role in human colon. In all large intestine samples, cells with high expression of TSPAN8 were found to be concentrated in the enterocytes of the epithelium, located in the mucosa region (First column of Figure 3D). Immunoglobulin Heavy Constant Delta (IGHD) (34) was found to be among the top 10 Moran's I score in the tonsil sample (scProAtlas ID SCP_CyCIF7). IGHD is a glycoprotein produced by B lymphocytes. It is commonly used as a cell-type specific marker for naive B cells in previous studies (31,37). Here, we also utilized neighborhood analysis to examine the regions with high IGHD expression. Consistent with previous results (38,39), the high IGHD expression regions were annotated as NBC (naive B cells) (Second column of Figure 3D). This result demonstrates that scProAtlas can provide accurate annotation information and reasonable spatial distribution from a genetic perspective.
Proximity analysis
A challenge in studying spatial data is how to quantify and describe the relationships between cells within a tissue, as well as the relationships between neighborhoods. Here, we explore the spatial enrichment relationships between cell types and neighborhoods to quantify spatial co-localization between two labels. In this module, all 945 ROIs were analyzed to quantify the co-localization of cell types and neighborhoods. Here, we take Reg001 from the large intestine with scProAtlas ID SCP_CODEX1 as an example to visualize the enrichment scores between all cell type and neighborhood labels. The heatmap shows that multiple cell type pairs and neighborhood pairs have high scores (Figure 3E). We selected the combinations with the highest enrichment scores for visualization: the cell types ‘CD4 + alpha-beta T cells’ and ‘CD8 + alpha-beta T cells’, and the neighborhood pair ‘Immune_Epithelium’ and ‘Lamina_Propria_Crypt’. This result is consistent with previous research findings (40), where CD4+ αβ T cells and CD8+ αβ T cells are significantly enriched in the intestine, exhibiting unique TCR chain characteristics and indicating their role in immune responses.
Spatial-related cell–cell interaction
Cells can communicate with each other by releasing ligands. These ligands can bind to receptors on other cells, triggering specific biological processes that elicit different cellular responses. In this module, we have identified the spatial ligand–receptor interaction information present in each ROI and displayed their communication frequencies. We selected the top 10% of genes in each ROI based on Moran's I scores. This module allows users to query interest genes by the gene symbol. They can also query interest technology and tissue type. In the interactive network diagram, each node represents a cell type, and the edges between nodes represent the interaction frequency between two cell types. The thickness of the edges indicates the number of ligand–receptor interactions between cell types. Users can click on the edges to view detailed information about each interaction. For example, the interaction between CD80 and CTLA-4 in the intestine was found to have a high interaction frequency in multiple ROIs. The CD80 and CTLA-4 interaction has been reported to play a role in intestinal immunity in prior studies (41). CTLA-4 has a high affinity for CD80 and CTLA-4′s negative regulation plays a pivotal role in preventing autoimmune responses and maintaining immune system balance (42–44). Our analysis is consistent with previous studies, which showed that interactions between CD80 and CTLA-4 within the intestine consistently exhibit relatively high interaction strength in T cell-mediated interactions (Figure 3G). Also, we identified CD80/CD86-CD28 interactions concentrated between B cells and T cells. This is consistent with the fact that CD80/CD86 from B cells typically delivers co-stimulatory signals that further promote B cell proliferation and differentiation (45). All results are presented in tables and network diagrams on the scProAtlas webpage.
Identification of the spatial pathways
To identify the biological functions for each spatial pattern genes, we performed the enrichment analysis by using EnrichR. The spatial pathways module provides biological pathway and P values GO biological process, cellular component, molecular function (46,47) and KEGG (48), is used to annotate the enriched biological pathways for each identified spatial pattern gene. Due to the large number of genes identified across all ROIs, we display the top 10% of spatial pattern genes based on Moran's I scores for each ROI. For example, the high expression of MZB1 in plasma cells across multiple tissues may have influence on the processes of regulation of B cell activation, the regulation of B cell proliferation and the regulation of lymphocyte proliferation pathways (31).
Marker visualization
In order to comprehensively understand the relationship between the marker proteins used in spatial proteomics data and tissue structure, we have visualized the expression levels of all proteins here. In spatial proteomics, the selected feature channels are often key markers used for cell type identification. Therefore, extending marker expression analysis to explore its relationships with cell types and neighborhoods will provide a more comprehensive understanding of the expression patterns of markers across different cell types and regions. Here, we take SCP_CODEX1 – Large intestine as an example, we use a heatmap to show the expression levels of markers across different cell types (Supplementary Figure S5). In the results, CD117 is highly expressed in mast cells. Consistent with our results, previous studies have also shown that CD117 (c-Kit) is commonly used as one of the markers for identifying mast cells and is an important antibody in immunofluorescence staining for recognizing mast cells (49–51). Similarly, in SCP_CODEX1 – Large intestine neighborhood result (Supplementary Figure S6), CD36 exhibits high expression in the crypt region. CD36 has been observed to show an expression pattern along the crypt-to-villus axis in the intestine (52–54).
Differential expression gene
Identification of differential expression genes can help researchers reveal changes in gene expression across different cell types, tissue regions or tissue samples. This process can identify genes that are significantly expressed in specific biological states or pathological conditions, providing deeper insights into biological mechanisms. Here, we performed differential expression gene analysis for each sample. By identifying genes with significant differences in expression within cell types or neighborhoods and combining this with identified spatially expressed genes, we can gain clearer insights into the roles these genes play in cell types and neighborhood structures in tissue.
Discussion
scProAtlas provides annotation information for multi-technology datasets from 15 different human tissues at various resolutions. We first performed neighborhood-based functional annotations to illustrate the distribution of cell types within specific regions and provided information on spatial communication and spatial colocalization between these regions using neighborhood networks and spatial enrichment at region level. Then, we conducted annotations based on cell types, showing the spatial distribution of each cell type and identifying potential spatial colocalization at single-cell resolution. Moreover, we integrated data to obtain predicted gene expression profiles within spatial proteomics. We use the integrated genes to identify which genes exhibit high spatial expression patterns across a large number of cells, while also identifying genes with potential ligand-receptor interactions and their corresponding biological pathways. By combining neighborhoods, cells and genes, scProAtlas offers detailed functional annotation information within a tissue. We can use these annotations for examining genes with spatial expression patterns in these regions, comparing the distribution of these genes across cell types and neighborhoods, and understanding the associated biological pathways, ultimately providing a comprehensive understanding of the tissue. In addition, scProAtlas is a user-friendly database for searching and data browsing (Figure 4A). Users can browse all dataset annotations by analysis modules or by tissue types (Figure 4B). scProAtlas also allows users to browse all annotation results based on the dataset framework. Our dataset framework is organized by a structure that is distinguished by technique-dataset-tissue-ROI structure (Figure 4C). scProAtlas provides both quick search and advanced search options, users can perform a quick search by entering a gene symbol to obtain all corresponding results for that gene. Additionally, they can use advanced search to narrow down the results by specifying the relevant technique, tissue to focus on the gene of interest (Figure 4D). Users can also access basic features within scProAtlas, such as downloading data, viewing statistics and accessing help (Figure 4E-F). Overall, it is a suitable tool for researchers to study the relationship between proteins/genes and spatial organization in specific tissues, whether through specific models or for algorithm design (55,56).
Figure 4.
Interface of scProAtlas webpage. (A). Navigation bar of the function in scProAtlas. (B). Browser of analysis module and tissues in the scProAtlas home page. (C). All the datasets in scProAtlas are shown in Data Archieve page. User can access the whole analysis results with the ‘detail’ button. With the filter bars, users can filter the data they wish to browse based on technique, dataset, and tissue. (D). Advanced search of scProAtlas, users can use this feature to input a gene of interest based on their selected technique and tissue, allowing them to query the gene's results in spatial pattern genes, cell–cell interaction, and spatial pathways within the dataset. (E). ‘Help’ page of scProAtlas. (F). Statistics information in scProAtlas.
scProAtlas can be updated in several directions in the future. Firstly, we will regularly and continuously collect and organize all spatial proteomics datasets and large databases, expanding the scale of the datasets with the aim of developing scProAtlas into an atlas based on the entire human organ system, enabling a more comprehensive comparison of tissue heterogeneity. Additionally, we plan to further enhance the interactivity of the website, allowing users to obtain more comprehensive and clearer information comparisons and visualizations during their queries, with interactive visualization results at the cellular scale. Furthermore, scProAtlas offers a vast collection of samples with various labels, such as cell type, neighborhood and genes with high spatial expression patterns. These data are extremely valuable for training large language models or for fine-tuning existing large predictive models, like GeneFormer, Nicheformer (57,58). In the future, we plan to utilize these diverse spatial proteomics datasets to model and train large predictive models.
Supplementary Material
Acknowledgements
We would like to express our gratitude to our colleagues and friends who provided invaluable advice and support throughout the duration of this study. Additionally, we extend our sincere appreciation to the generous researchers who willingly shared their data, as well as the dedicated database staff for their exceptional efforts in collecting and managing the data.
Contributor Information
Tiangang Wang, School of Life Science and Technology, Xidian University, Xi’an, Shaanxi 710071, P.R. China; Center for Computational Systems Medicine, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Xuanmin Chen, School of Life Science and Technology, Xidian University, Xi’an, Shaanxi 710071, P.R. China.
Yujuan Han, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 843000, P.R. China.
Jiahao Yi, Department of Medical Informatics, School of Biology and Engineering, Guizhou Medical University, Guiyang, Guizhou 561100, P.R. China.
Xi Liu, School of Life Science and Technology, Xidian University, Xi’an, Shaanxi 710071, P.R. China.
Pora Kim, Center for Computational Systems Medicine, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Liyu Huang, School of Life Science and Technology, Xidian University, Xi’an, Shaanxi 710071, P.R. China.
Kexin Huang, School of Life Science and Technology, Xidian University, Xi’an, Shaanxi 710071, P.R. China; Center for Computational Systems Medicine, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Xiaobo Zhou, Center for Computational Systems Medicine, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; School of Dentistry, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Data availability
scProAtlas is a freely accessible database resource open to all users. The database can be accessed at https://relab.xidian.edu.cn/scProAtlas/#/. All analysis results have been uploaded to: https://zenodo.org/records/13315586. A complete manuscript detailing each part of the database is provided at the same link. The front-end of scProAtlas was developed using Vue 3.0, Element-plus (https://element-plus.org/). The back-end of scProAtlas using node.js and express (https://expressjs.com/). Database data management were performed using MySQL (https://www.mysql.com/). Interactive graphs were created by ECharts (https://echarts.apache.org/zh/index.html). All the upstream and downstream analysis were performed using Python. scProAtlas can be visited on popular web browsers, such as Google Chrome, Firefox, Microsoft Edge and Safari. All the standard analysis pipeline code are provided in https://github.com/ploughhh/scProAtlas_analysis and https://doi.org/10.5281/zenodo.13921833.
Supplementary data
Supplementary Data are available at NAR Online.
Funding
National Natural Science Foundation of China [62373292 and 82227802 to L.H.]. China Postdoctoral Science Foundation [No. 2023M742498 to K.H.]; National Institutes of Health [R35GM138184 to P.K.; R01LM014156, R01GM153822, R01CA241930 to X.Z.]; National Science Foundation [NSF2217515 and NSF2326879 to X.Z];
Conflict of interest statement. None declared.
References
- 1. Giesen C., Wang H.A., Schapiro D., Zivanovic N., Jacobs A., Hattendorf B., Schüffler P.J., Grolimund D., Buhmann J.M., Brandt S.et al.. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat. Methods. 2014; 11:417–422. [DOI] [PubMed] [Google Scholar]
- 2. Chang Q., Ornatsky O.I., Siddiqui I., Loboda A., Baranov V.I., Hedley D.W.. Imaging mass cytometry. Cytometry A. 2017; 91:160–169. [DOI] [PubMed] [Google Scholar]
- 3. Liu C.C., Bosse M., Kong A., Kagel A., Kinders R., Hewitt S.M., Varma S., van de Rijn M., Nowak S.H., Bendall S.C.et al.. Reproducible, high-dimensional imaging in archival human tissue by multiplexed ion beam imaging by time-of-flight (MIBI-TOF). Lab. Invest. 2022; 102:762–770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Lin J.R., Fallahi-Sichani M., Sorger P.K.. Highly multiplexed imaging of single cells using a high-throughput cyclic immunofluorescence method. Nat. Commun. 2015; 6:8390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Black S., Phillips D., Hickey J.W., Kennedy-Darling J., Venkataraaman V.G., Samusik N., Goltsev Y., Schürch C.M., Nolan G.P.. CODEX multiplexed tissue imaging with DNA-conjugated antibodies. Nat. Protoc. 2021; 16:3802–3835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Schubert W., Bonnekoh B., Pommer A.J., Philipsen L., Böckelmann R., Malykh Y., Gollnick H., Friedenberger M., Bode M., Dress A.W.. Analyzing proteome topology and function by automated multidimensional fluorescence microscopy. Nat. Biotechnol. 2006; 24:1270–1278. [DOI] [PubMed] [Google Scholar]
- 7. He S., Bhatt R., Brown C., Brown E.A., Buhr D.L., Chantranuvatana K., Danaher P., Dunaway D., Garrison R.G., Geiss G.et al.. High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging. Nat. Biotechnol. 2022; 40:1794–1806. [DOI] [PubMed] [Google Scholar]
- 8. Liu Y., DiStasio M., Su G., Asashima H., Enninful A., Qin X., Deng Y., Nam J., Gao F., Bordignon P.et al.. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial CITE-seq. Nat. Biotechnol. 2023; 41:1405–1409. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Hennig C., Adams N., Hansen G.. A versatile platform for comprehensive chip-based explorative cytometry. Cytometry A. 2009; 75:362–370. [DOI] [PubMed] [Google Scholar]
- 10. Zheng Y., Chen Y., Ding X., Wong K.H., Cheung E.. Aquila: a spatial omics database and analysis platform. Nucleic Acids Res. 2023; 51:D827–D834. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Yuan Z., Pan W., Zhao X., Zhao F., Xu Z., Li X., Zhao Y., Zhang M.Q., Yao J.. SODB facilitates comprehensive exploration of spatial omics data. Nat. Methods. 2023; 20:387–399. [DOI] [PubMed] [Google Scholar]
- 12. Greenwald N.F., Miller G., Moen E., Kong A., Kagel A., Dougherty T., Fullaway C.C., McIntosh B.J., Leow K.X., Schwartz M.S.et al.. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat. Biotechnol. 2022; 40:555–565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. van der Walt S., Schönberger J.L., Nunez-Iglesias J., Boulogne F., Warner J.D., Yager N., Gouillart E., Yu T.. scikit-image: image processing in Python. PeerJ. 2014; 2:e453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Virshup I., Rybakov S., Theis F.J., Angerer P., Wolf F.A.. anndata: access and store annotated data matrices. J. Open Source Software. 2024; 9:4371. [Google Scholar]
- 15. Wolf F.A., Angerer P., Theis F.J.. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018; 19:15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Chen S., Zhu B., Huang S., Hickey J.W., Lin K.Z., Snyder M., Greenleaf W.J., Nolan G.P., Zhang N.R., Ma Z.. Integration of spatial and single-cell data across modalities with weakly linked features. Nat. Biotechnol. 2024; 42:1096–1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Szklarczyk D., Kirsch R., Koutrouli M., Nastou K., Mehryary F., Hachilif R., Gable A.L., Fang T., Doncheva N.T., Pyysalo S.et al.. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023; 51:D638–D646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Bhate S.S., Barlow G.L., Schürch C.M., Nolan G.P.. Tissue schematics map the specialization of immune tissue motifs and their appropriation by tumors. Cell Syst. 2022; 13:109–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Schürch C.M., Bhate S.S., Barlow G.L., Phillips D.J., Noti L., Zlobec I., Chu P., Black S., Demeter J., McIlwain D.R.et al.. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell. 2020; 182:1341–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Hickey J.W., Becker W.R., Nevins S.A., Horning A., Perez A.E., Zhu C., Zhu B., Wei B., Chiu R., Chen D.C.et al.. Organization of the human intestine at single-cell resolution. Nature. 2023; 619:572–584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Virtanen P., Gommers R., Oliphant T.E., Haberland M., Reddy T., Cournapeau D., Burovski E., Peterson P., Weckesser W., Bright J.et al.. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods. 2020; 17:261–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Palla G., Spitzer H., Klein M., Fischer D., Schaar A.C., Kuemmerle L.B., Rybakov S., Ibarra I.L., Holmberg O., Virshup I.et al.. Squidpy: a scalable framework for spatial omics analysis. Nat. Methods. 2022; 19:171–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Moran P.A. Notes on continuous stochastic phenomena. Biometrika. 1950; 37:17–23. [PubMed] [Google Scholar]
- 24. Efremova M., Vento-Tormo M., Teichmann S.A., Vento-Tormo R.. CellPhoneDB: inferring cell-cell communication from combined expression of multi-subunit ligand-receptor complexes. Nat. Protoc. 2020; 15:1484–1506. [DOI] [PubMed] [Google Scholar]
- 25. Chen E.Y., Tan C.M., Kou Y., Duan Q., Wang Z., Meirelles G.V., Clark N.R., Ma’ayan A.. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinf. 2013; 14:128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Kuleshov M.V., Jones M.R., Rouillard A.D., Fernandez N.F., Duan Q., Wang Z., Koplev S., Jenkins S.L., Jagodnik K.M., Lachmann A.et al.. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016; 44:W90–W97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Xie Z., Bailey A., Kuleshov M.V., Clarke D.J.B., Evangelista J.E., Jenkins S.L., Lachmann A., Wojciechowicz M.L., Kropiwnicki E., Jagodnik K.M.et al.. Gene set knowledge discovery with Enrichr. Curr. Prot. 2021; 1:e90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Govek K.W., Troisi E.C., Miao Z., Aubin R.G., Woodhouse S., Camara P.G.. Single-cell transcriptomic analysis of mIHC images via antigen mapping. Sci. Adv. 2021; 7:eabc5464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Zhu B., Chen S., Bai Y., Chen H., Liao G., Mukherjee N., Vazquez G., McIlwain D.R., Tzankov A., Lee I.T.et al.. Robust single-cell matching and multimodal analysis using shared and distinct features. Nat. Methods. 2023; 20:304–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Andreani V., Ramamoorthy S., Pandey A., Lupar E., Nutt S.L., Lämmermann T., Grosschedl R.. Cochaperone Mzb1 is a key effector of Blimp1 in plasma cell differentiation and β1-integrin function. Proc. Nat. Acad. Sci. U.S.A. 2018; 115:E9630–e9639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Thul P.J., Lindskog C.. The human protein atlas: a spatial map of the human proteome. Protein Sci. 2018; 27:233–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Miyagawa-Hayashino A., Yoshifuji H., Kitagori K., Ito S., Oku T., Hirayama Y., Salah A., Nakajima T., Kiso K., Yamada N.et al.. Increase of MZB1 in B cells in systemic lupus erythematosus: proteomic analysis of biopsied lymph nodes. Arthritis Res. Ther. 2018; 20:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Abdulla S., Aevermann B., Assis P., Badajoz S., Bell S.M., Bezzi E., Cakir B., Chaffer J., Chambers S., Michael Cherry J.et al.. CZ CELL×GENE Discover: a single-cell data platform for scalable exploration, analysis and modeling of aggregated data. 2023; bioRxiv doi:02 November 2023, preprint: not peer reviewed 10.1101/2023.10.30.563174. [DOI] [PMC free article] [PubMed]
- 34. Safran M., Dalah I., Alexander J., Rosen N., Iny Stein T., Shmoish M., Nativ N., Bahir I., Doniger T., Krug H.et al.. GeneCards Version 3: the human gene integrator. Database. 2010; 2010:baq020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Min J., Yang S., Cai Y., Vanderwall D.R., Wu Z., Li S., Liu S., Liu B., Wang J., Ding Y.et al.. Tetraspanin Tspan8 restrains interferon signaling to stabilize intestinal epithelium by directing endocytosis of interferon receptor. Cell Mol. Life Sci. 2023; 80:154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Zhu Y., Saint-Pol J., Nguyen V., Rubinstein E., Boucheix C., Greco C.. The Tetraspanin Tspan8 associates with Endothelin Converting Enzyme ECE1 and regulates its activity. Cancers. 2023; 15:4751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Espinoza D.A., Le Coz C., Cruz Cabrera E., Romberg N., Bar-Or A., Li R.. Distinct stage-specific transcriptional states of B cells derived from human tonsillar tissue. JCI Insight. 2023; 8:e155199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Dirks J., Andres O., Paul L., Manukjan G., Schulze H., Morbach H.. IgD shapes the pre-immune naïve B cell compartment in humans. Front. Immunol. 2023; 14:1096019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Gao M., Liu S., Chatham W.W., Mountz J.D., Hsu H.C.. IL-4-induced quiescence of resting naive B cells is disrupted in systemic Lupus erythematosus. J. Immunol. 2022; 209:1513–1522. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Rosati E., Rios Martini G., Pogorelyy M.V., Minervina A.A., Degenhardt F., Wendorff M., Sari S., Mayr G., Fazio A., Dowds C.M.et al.. A novel unconventional T cell population enriched in Crohn's disease. Gut. 2022; 71:2194–2204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Wosen J.E., Mukhopadhyay D., Macaubas C., Mellins E.D.. Epithelial MHC class II expression and its role in antigen presentation in the gastrointestinal and Respiratory tracts. Front. Immunol. 2018; 9:2144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Hossen M.M., Ma Y., Yin Z., Xia Y., Du J., Huang J.Y., Huang J.J., Zou L., Ye Z., Huang Z.. Current understanding of CTLA-4: from mechanism to autoimmune diseases. Front. Immunol. 2023; 14:1198365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Ville S., Poirier N., Blancho G., Vanhove B.. Co-stimulatory blockade of the CD28/CD80-86/CTLA-4 balance in transplantation: impact on memory T cells?. Front. Immunol. 2015; 6:411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Yakoub A.M., Schülke S.. A model for apoptotic-cell-mediated adaptive immune evasion via CD80-CTLA-4 signaling. Front. Pharmacol. 2019; 10:562. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Chevrier S., Genton C., Malissen B., Malissen M., Acha-Orbea H.. Dominant role of CD80-CD86 over CD40 and ICOSL in the massive polyclonal B cell activation mediated by LAT(Y136F) CD4(+) T cells. Front. Immunol. 2012; 3:27. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T.et al.. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000; 25:25–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Aleksander S.A., Balhoff J., Carbon S., Cherry J.M., Drabkin H.J., Ebert D., Feuermann M., Gaudet P., Harris N.L., Hill D.P.et al.. The gene ontology knowledgebase in 2023. Genetics. 2023; 224:iyad031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Kanehisa M., Furumichi M., Sato Y., Kawashima M., Ishiguro-Watanabe M.. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023; 51:D587–D592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Mazreah S.A., Shahsavari M., Kalati P.A., Mazreah H.A.. Immunohistochemical evaluation of CD117 in mast cell of aggressive periodontitis. J. Indian Soc. Periodontol. 2020; 24:216–220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Ustun C., DeFor T.E., Karadag F.K., Don Yun H., Nathan S., Brunstein C.G., Blazar B.R., Weisdorf D.J., Holtan S.G., Amin K.. Tissue mast cell counts may be associated with decreased severity of gastrointestinal acute GVHD and nonrelapse mortality. Blood Adv. 2020; 4:2317–2324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Cherian S., McCullouch V., Miller V., Dougherty K., Fromm J.R., Wood B.L.. Expression of CD2 and CD25 on mast cell populations can be seen outside the setting of systemic mastocytosis. Cytometry B Clin. Cytom. 2016; 90:387–392. [DOI] [PubMed] [Google Scholar]
- 52. Drover V.A., Ajmal M., Nassir F., Davidson N.O., Nauli A.M., Sahoo D., Tso P., Abumrad N.A.. CD36 deficiency impairs intestinal lipid secretion and clearance of chylomicrons from the blood. J. Clin. Invest. 2005; 115:1290–1297. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Lobo M.V., Huerta L., Ruiz-Velasco N., Teixeiro E., de la Cueva P., Celdrán A., Martín-Hidalgo A., Vega M.A., Bragado R.. Localization of the lipid receptors CD36 and CLA-1/SR-BI in the human gastrointestinal tract: towards the identification of receptors mediating the intestinal absorption of dietary lipids. J. Histochem. Cytochem. 2001; 49:1253–1260. [DOI] [PubMed] [Google Scholar]
- 54. Drover V.A., Nguyen D.V., Bastie C.C., Darlington Y.F., Abumrad N.A., Pessin J.E., London E., Sahoo D., Phillips M.C.. CD36 mediates both cellular uptake of very long chain fatty acids and their intestinal absorption in mice. J. Biol. Chem. 2008; 283:13108–13115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Sun X., Zhang L., Tan H., Bao J., Strouthos C., Zhou X.. Multi-scale agent-based brain cancer modeling and prediction of TKI treatment response: incorporating EGFR signaling pathway and angiogenesis. BMC Bioinf. 2012; 13:218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Jiang S., Zhou X., Kirchhausen T., Wong S.T.. Detection of molecular particles in live cells via machine learning. Cytometry A. 2007; 71:563–575. [DOI] [PubMed] [Google Scholar]
- 57. Theodoris C.V., Xiao L., Chopra A., Chaffin M.D., Al Sayed Z.R., Hill M.C., Mantineo H., Brydon E.M., Zeng Z., Liu X.Set al.. Transfer learning enables predictions in network biology. Nature. 2023; 618:616–624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Schaar A.C., Tejada-Lapuerta A., Palla G., Gutgesell R., Halle L., Minaeva M., Vornholz L., Dony L., Drummer F., Bahrami M.et al.. Nicheformer: a foundation model for single-cell and spatial omics. 2024; bioRxiv doi:17 April 2024, preprint: not peer reviewed 10.1101/2024.04.15.589472. [DOI]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
scProAtlas is a freely accessible database resource open to all users. The database can be accessed at https://relab.xidian.edu.cn/scProAtlas/#/. All analysis results have been uploaded to: https://zenodo.org/records/13315586. A complete manuscript detailing each part of the database is provided at the same link. The front-end of scProAtlas was developed using Vue 3.0, Element-plus (https://element-plus.org/). The back-end of scProAtlas using node.js and express (https://expressjs.com/). Database data management were performed using MySQL (https://www.mysql.com/). Interactive graphs were created by ECharts (https://echarts.apache.org/zh/index.html). All the upstream and downstream analysis were performed using Python. scProAtlas can be visited on popular web browsers, such as Google Chrome, Firefox, Microsoft Edge and Safari. All the standard analysis pipeline code are provided in https://github.com/ploughhh/scProAtlas_analysis and https://doi.org/10.5281/zenodo.13921833.







