Abstract
Spatial Transcriptomics leverages gene expression profiling while preserving spatial location and histological images. However, processing the vast and noisy image data in spatial transcriptomics (ST) for precise recognition of spatial domains remains a challenge. In this study, we propose a method of EfNST for recognizing spatial domains, which employs an efficient composite scaling network of EfficientNet to learn multi-scale image features. Compared with other relevant algorithms on six data sets from three sequencing platforms, EfNST exhibits higher accuracy in discerning fine tissue structures, highlighting its strong scalability to data and operational efficiency. Under limited computing resources, the testing results on multiple data sets show that the EfNST algorithm runs faster while maintaining accuracy. The ablation studies of EfNST model demonstrate the significant effectiveness of the EfficientNet. Within the annotated data sets, EfNST showcases the ability to finely identify subregions within tissue structure and discover corresponding marker genes. In the unannotated data sets, EfNST successfully identifies minute regions within complex tissues and elucidated their spatial expression patterns in biological processes. In summary, EfNST presents a novel approach to inferring cellular spatial organization from discrete data spots with significant implications for the exploration of tissue structure and function.
Subject terms: Computational models, Machine learning
EfNST improves the identification of spatial domains using EfficientNet for multi-scale image features. It effectively reveals tissue structures and marker genes, aiding in the understanding of spatial gene expression patterns.
Introduction
The single-cell transcriptomic1 has refined the research accuracy from the organizational level to the single cell, allowing researchers to analyze processes such as embryonic development, cell reprogramming, and disease occurrence with unprecedented resolution2–5. However, cell subpopulations act in an intertwined manner based on their specific spatial positions within a given tissue6. The spatial heterogeneity is a crucial characteristic for understanding organ function, cell fate regulation mechanism, and cell lineage generation7–9. The single-cell transcriptomics technologies inevitably lose spatial information in the process of dissociating solid tissues into individual cells. Therefore, in order to better understand the spatial heterogeneity of different cells, it is necessary to simultaneously understand their transcriptional heterogeneity and spatial location information.
In recent years, the ST technologies such as 10x Visium10, Slide-seq11, and Stereo-seq12 has become a powerful tool in studying molecular biological mechanisms, which can obtain the spatial information along with transcriptomic expression profiling information of cells to analyze and characterize the expression profiles of specific cell types on a spatial scale13. However, the low sensitivity, high dimensionality, multiple sparsity, high noise and multimodality of spatial transcriptomics data make spatial domain recognition limiting. These challenges have spurred the development of algorithms designed for accurate and robust spatial domain recognition. Currently, spatial transcriptomic algorithms are primarily categorized into probability-based models and deep learning models. Probability-based algorithms (e.g., BayesSpace14, Giotto15, SC-MEB16, Scanpy17, Seurat18) often have longer runtime, lower flexibility in handling large-scale ST data, struggle with complex nonlinear relationships, and lack autonomous feature learning capabilities. On the other hand, deep learning models (e.g., STAGATE19, SEDR20, CCST21) are suitable for large-scale, high-dimensional data, allowing nested model usage for automatic and fast learning of complex features, thus alleviating analysis complexity and significantly advancing spatial domain recognition technology. Despite their outstanding feature learning, these models have complex training designs, slower training speeds, and features learned lack clustering orientation.
Spatial transcriptomics (ST) data encompasses image information obtained through techniques like organization, slicing, staining, and other imaging modalities. To derive meaningful insights from these data, it is essential to employ image segmentation and feature extraction techniques. The subsequent analysis of the image data involves integration with transcriptomic data to elucidate the spatial distribution and functions of cells within tissues. Due to the unique morphological features provided by image information, intuitively depicting the spatial distribution of cells, incorporating ST histological image information to aid spatial domain recognition is of crucial importance. However, due to the significant noise and large size of ST data, most current algorithms (e.g., stLearn22, SpaGCN23, TIST24) still face challenges in efficiently integrating image information, exhibiting high model complexity, slow processing speeds, and inadequate image feature extraction. Thus, efficiently handling the vast image data in ST using an effective network to obtain image features and increase computational speed remains a challenge.
To address the above issues, we present EfNST, a model rooted in the EfficientNet25 convolutional neural network architecture. EfNST is specifically designed to efficiently integrate image information and enhance spatial domain recognition performance. For image feature extraction, it utilizes the robust architecture of EfficientNet, a convolutional neural network framework developed by Google. This framework is adept at learning multiscale image features, thereby boosting computational efficiency. In addition, EfNST incorporates a Variational Graph Autoencoder (VGAE)26 to capture latent representations of nodes in graph-structured data. Furthermore, a Denoising Autoencoder (DAE)27 is implemented to mitigate complex noise and interference within the data, facilitating the extraction of more resilient latent features. To validate the efficacy of the algorithm, we conducted tests on six data sets from three sequencing platforms. The results indicated that EfNST outperforms seven classic benchmark methods in terms of spatial domain recognition. Specifically, EfNST demonstrated high accuracy in identifying subregions of tumors within the human breast cancer data set, providing a more detailed delineation of tumor areas. Moreover, in the case of four other data sets, EfNST exhibited superior proficiency in recognizing intricate spatial patterns within tissues. In conclusion, EfNST provides a novel approach to spatial domain identification, facilitating more accurate analysis of tissue structures and advancing the exploration of spatial gene expression patterns.
Results
The EfNST framework
EfNST accurately identifies spatial domains by integrating gene expression profiling, spatial location, and potential characterization of histological image information to elucidate heterogeneity in tissue structure (Fig. 1a–d). First, for the ST data of 10x Visium with histological images, EfNST preprocesses gene expression profiling, spatial location, and hematoxylin and eosin (H&E) image information in the ST data. Then, to characterize the neighborhood information, EfNST utilizes the K-Nearest Neighbors similarity measurement method to construct a normalized adjacency matrix from spatial location information and capture the neighboring spots for each spot. For histological image data, we first segment the original image (H&E staining tiles) based on the coordinates of each spot to obtain 50 × 50 pixel sub-images. The torchvision.transforms function from the PyTorch library was then used to transform and enhance these sub-images into high-quality 224 × 224 pixel blocks. Then, the EfficientNet-B0 model from EfficientNet network is employed for feature extraction on the image blocks. The extracted features are dimensionally reduced to 50 using Principal Component Analysis (PCA), resulting in an underlying feature representation of the image blocks. These representations form the image feature matrix, where each row corresponds to a feature vector of an image block, offering a comprehensive description and representation of the entire image data set. Compared to conventional CNN models (e.g., Resnet5028, Resnet15228, VGG1929, DenseNet12130, and Inception_v331), EfficientNet employs network structure search and a multiscale compound method, reducing parameters and computational complexity while maintaining accuracy. Furthermore, it showcases enhanced generalization capabilities, increased accuracy, and faster convergence speed when handling input data of varying sizes. Subsequently, data augmentation is applied to the gene expression matrix, adjacency matrix and image feature matrix to obtain an augmented gene expression matrix.
Fig. 1. The workflow of the EfNST.
a The input ST data are Gene Expression, Histological Images and Spatial Location. b EfNST processed the H&E Images and Spatial Locations to obtain Image Patches, which were processed using a pre-trained EfficientNet network to obtain Image Feature Matrix. Data Augmentation is performed for each spot based on the similarity of spots in spatial combined with the gene expression weights and the spatial location weights. c Final latent embedding is achieved using VGAE and DAE, H1 represents Hidden layer, Z represents the graph embedding, H2 represents Low-dimensional representation. d Latent representations can be used to perform Downstream Analysis.
For the osmFISH and STARmap data without histological images, the gene expression matrix and adjacency matrix were directly used to obtain the augmented gene expression matrix, and the adjacency matrix used in the calculation has a weight of 0.2. Then, principal component analysis (PCA) was applied to the augmented gene expression data to extract the first 50 dimensions of principal components (optional) as latent characteristics. Following this, the augmented gene expression data is trained using a linear layer and a convolutional layer. Specifically, the linear encoder comprises hidden layers of sizes 50 and 20, while the linear decoder consists of hidden layers of sizes 50 and 60. The convolutional layer includes hidden units numbered at 32 and 8. Afterward, a global attention mechanism is introduced into the encoder component of the Variational Graph Autoencoder (VGAE) to capture patterns among intermediate nodes, enabling effective extraction of latent embeddings by VGAE and Deep Autoencoder (Fig. 1c). This combination helps in managing the noise and sparsity in the data, enhancing the model’s ability to learn and represent data features. Finally, the low-dimensional latent representations can be utilized for downstream analyses, including the identification of spatial domains, marker genes, and visualization purposes (Fig. 1d).
EfNST clearly depicts laminar structures in the human dorsolateral prefrontal cortex
To evaluate the fundamental performance of the EfNST algorithm, we utilized the Dorsolateral Prefrontal Cortex (DLPFC)32 data set consisting of 12 slices from spatialLIBD33. This data set serves as a benchmark due to its distinct and well-established morphological boundaries. The obvious hierarchical structure and labeling information of the sliced data set allow for a visualization of the similarity between the algorithm’s clustering effect and the manually labeled regions. This comparison enables a quantitative evaluation of the algorithm’s efficacy in identifying spatial domains. Within the manually annotated DLPFC region, eight slices exhibit a seven-layer organization. These slices span six cortical layers and white matter (WM), and the order of these layers is layer 1 to layer 6 to the WM layer. The remaining four DLPFC slices feature a five-layer tissue structure, spanning four cortical layers and the WM layer in the sequence of layer 3, layer 4, layer 5, layer 6, and the WM layer.
In the study, we mainly conducted performance testing on two selected brain slices: one with a five-layer structure and the other with a seven-layer structure. To assess the effectiveness of EfNST, we compared its results with those from seven classical methods: Seurat, CCST, Scanpy, SEDR, stLearn, conST, and SpaGCN. In the DLPFC data set, slice 151670 was one of the few slices with five layers of organization (Fig. 2a). Our analysis showed that EfNST successfully identified these five layers with obvious stratification and clearer boundaries, aligning closely with manually labeled structures. There were almost no scattered spots, especially in the consistent positions of WM, Layer 3 and Layer 5. Conversely, the results of Scanpy displayed a lack of hierarchy with most spots in a mixed state. Seurat revealed two hierarchical structures but failed to clearly define the remaining regions. stLearn accurately identified the WM layer but falls short in delineating the other spatial domains. Although the results of CCST had no scattered spots, the location of the identified regions was not accurate. The conST identifies five layered structures but without clear hierarchy or distinct boundaries. SpaGCN identified four layered structures, but its identified regions lack accuracy. Although BASS, ADEPT, and STAGATE identified 5 layers of layered tissue, the specific location of the layer structure was not accurately identified, such as the WM layer. Compared with the ground truth, GraphST and SpatialPCA had accurate position recognition and better spatial domain recognition performance (Supplementary Fig. 1a).
Fig. 2. EfNST improves the ability to recognize the DLPFC layer structures.
a The comparison between the results of EfNST and that of other seven algorithms on slice 151670. b The comparison between the results of EfNST and that of other seven algorithms on slice 151507. c The comparison of four evaluation indicators between the results of EfNST and that of other 7 algorithms.
With a seven-layer hierarchical structure, the slice 151507 was the largest slice (Fig. 2b). The manually annotated layers served as the ground truth for comparison purposes. The EfNST was compared with seven other classic algorithms. The results revealed that the layer structures identified by EfNST exhibited clear boundaries, accurate positions, and no scattered spots. In contrast, the regions identified by Scanpy were chaotic, with most of the structural areas not accurately divided. Although Layer 1 was identified, it also contained spots from other spatial domains. Seurat roughly identified Layer 1, Layer 3 and WM layer, with spots of other layers being randomly distributed. Compared to the ground truth, the identified three layers had a wide positional range and were not very accurate. The clustering results of SEDR also exhibited the same phenomenon. CCST and stLearn had clear boundaries and fewer scattered spots, overall performing better than the previous three methods. However, when compared to the ground truth, their identified positions were not as accurate. For instance, CCST identified both Layer 4 and Layer 3 as Layer 5 with a wide range. The spots in Layer 5 and Layer 6 identified by SpaGCN and conST were mixed, and the boundaries were not well distinguished. BASS and GraphST identified less clear domain boundaries, but both recognized seven-layer organizational structures. The structure layers identified by STAGATE, ADEPT, and SpatialPCA were clear and accurate, demonstrating good clustering performance on slices of seven-layered organizational structures (Supplementary Fig. 1b).
In order to comprehensively evaluate the ability of EfNST, four clustered evaluation indicators were utilized to assess the similarity between the recognized domains of the above two slices and the ground truth benchmark. It is evident that EfNST outperformed the seven previous algorithms in ARI, NMI, FMI, and AMI scores on slice 151670. While on slice 151507, the algorithms stLearn, CCST, SpaGCN, and EfNST demonstrated strong performance (Fig. 2c and Supplementary Fig. 1a, b). The results of F1 scores also demonstrated the superior performance of EfNST compared to these algorithms (Supplementary Table 1). We further expanded the comparison to include more algorithms, and the results also demonstrated the superior performance of EfNST in detecting spatial domains (Supplementary Table 2). Moreover, random seeds were introduced in the image processing process, and the results verified the stability of EfNST (Supplementary Table 3).
EfNST enables finer characterization of human breast cancer tissue spatial domains
The spatial structure of breast cancer plays a pivotal role in understanding the development and progression of the disease. Spatial domain recognition in breast cancer data sets contributes to validating the algorithm’s ability to accurately capture spatial information and handle disease heterogeneity34. Based on extensive previous research, tissue slices are meticulously categorized into 21 regions through manual annotation, elucidating the differences in different regions at a finer scale within breast cancer tissue35. These 21 regions are further classified into four distinct types: Ductal Carcinoma in Situ/Lobular Carcinoma In Situ (DCIS/LCIS), healthy tissue, Invasive Ductal Carcinoma (IDC), and Tumor Edge (Fig. 3a).
Fig. 3. EfNST can recognize the spatial domain of human breast cancer with a finer level of granularity.
a The ground truth and the comparison of four evaluation indicators between the results of EfNST and that of other seven algorithms. b The comparison between results of EfNST and that of other seven algorithms on breast cancer tissue. c The spatial expression distribution of marker genes in breast cancer.
The seven algorithms were compared and analyzed on the breast cancer data set. As shown in Fig. 3b, Seurat identified 20 spatial domains with an ARI of 0.46, but the regions appeared more chaotic and had unclear boundaries compared to other algorithms. Scanpy performed worse than Seurat, with chaotic spot distributions, many clustered and discontinuous spots, and poorly defined region boundaries. It struggled to distinguish between DCIS/LCIS and Tumor_edge regions, resulting in confusing spots within these areas. The stLearn was able to clearly identify the tumor edge regions, including the smaller Tumor_edge_4 in it, and the Healthy regions were identified more excellently. However, the IDC regions in it is poorly recognized, the spots in the region are confusing and boundaries were not clear. SEDR identified the same number of clusters as the ground truth, but its accuracy in identifying the positions of independent regions was low, especially in subdividing multiple tumor edge regions into fragmented areas. CCST accurately identified multiple regions, including IDC_3, IDC_4, Tumor_edge_6, DCIS/LCIS_1, and DCIS/LCIS_5, but other tissue areas were fragmented. The conST performed poorly in recognizing small regions. SpaGCN had excellent clustering identification results but unclear boundaries at different structural domain borders. In comparison, the results of EfNST showed that it could generate more independent clusters, identifying the most independent clusters with the clearest boundaries. The positions of identified spatial domains by EfNST closely matched the ground truth, demonstrating its advantages in consistently detecting the spatial distribution of tumor tissues (Fig. 3a). The four clustering metrics were also higher than the other seven methods (Supplementary Fig. 2). EfNST had the highest ARI value of 0.61, while SEDR and Scanpy had lower scores, and conST and SpaGCN scored between 0.4 and 0.5. CCST, stLearn, and SEDR scored higher between 0.52 and 0.59 (Fig. 3b).
To evaluate the effectiveness of EfNST in detecting disease status or prognosis of breast cancer, we analyzed the spatial expression distribution of well-known breast cancer marker genes. For example, the breast cancer marker FOXA1, which is responsive to hormone therapy and associated with a good prognosis36, displays concentrated expression in the IDC region. As a crucial factor in assessing patient prognosis and links to the invasive characteristics of cancer cells, CCND137 also shows concentrated expression in the IDC region. Furthermore, the spatial expression distributions of TP53, BIRC5, TOP2A, and MYB were examined (Fig. 3c). Among them, TP53 mutation as a prognostic marker and a predictor for therapy response38. BIRC5 is a marker that indicates a poor prognosis in stage II/III breast cancer and shows no response to neoadjuvant chemotherapy39. TOP2A and MYB are molecular targets for breast cancer treatment, and several therapeutic strategies have been developed to target TOP2A, extending the survival of cancer patients40,41. Overall, our analysis highlights the potential of EfNST to provide valuable insights into the spatial expression patterns of breast cancer markers. In addition, it underscores the necessity of future studies with larger patient cohorts to validate these findings and further investigate their clinical implications.
EfNST can accurately describe the complex organization of the mouse brain
The mouse brain data set enables researchers to analyze specific brain regions, subregions, and cell types, permitting the observation and analysis of the spatial distribution patterns of gene expression across different anatomical structures. This facilitates the validation of the algorithm in identifying fine tissue structures42. To examine the recognition performance of the EfNST algorithm on mouse brain tissues, we used the mouse brain sagittal anterior and CytAssist fresh frozen mouse brain data sets from the 10x Visium platform for testing.
We used eight algorithms to evaluate the clustering effect on the mouse brain sagittal anterior data set with baseline truth (Supplementary Fig. 3a, b). The results showed that the tissue regions identified by conST are the most chaotic, and the boundaries between different regions cannot be clearly observed. The clustering effect of Seurat is slightly better than that of conST, but it also has the problem of unclear boundary hierarchy. The performance of Scanpy and SEDR was comparable, with both methods enabling the visualization of tissue areas, albeit with some chaotic spot patterns in certain regions. SpaGCN and stLearn exhibited equivalent clustering capabilities in the mouse brain sagittal anterior, achieving an ARI score of 0.31. Among the seven algorithms evaluated, EfNST stood out with the highest ARI score of 0.39, demonstrating a clear spatial domain structure and distinct stratification. In contrast, the ARI scores for the remaining algorithms did not exceed 0.36.
To further reveal the ability of EfNST to infer the biological relevance of domains, we examined the expression of relevant marker genes in different domains of the mouse brain sagittal anterior. In Supplementary Fig. 3c, we found that the genes of Enpp243, Nwd244, Cpne445, C1ql246, Gpr15147, and Wsf148 were highly expressed in the tissue slices. This suggests that the mouse brain sagittal anterior is closely associated with cognitive functions, and the role of Wsf1 may influence cognitive functions such as learning and memory in mice. The abnormal expression of C1ql2 may be related to the pathogenesis of neurodegenerative diseases.
To evaluate the ability of the EfNST algorithm in precisely identifying microstructural tissue regions within a data set lacking ground truth, cluster analysis of CytAssist fresh frozen mouse brain was performed. Due to the lack of a ground truth, there is no explicit definition for the number of clusters, so we implemented a comparative analysis for these algorithms with a consistent clustering resolution. The aim of this study is to reveal the differences between the algorithms in capturing the fine structure of brain regions and distinguishing ambiguous regions at the boundaries. Owing to the computational limitations of other algorithms, this paper only compares the Seurat, SEDR, stLearn, Scanpy, and conST methods.
Considering the complexity of brain tissue, we subdivided the data set into 47 regions to achieve a more precise identification of regions in the mouse brain (Fig. 4a). The results showed that conST performed poorly in recognizing mouse brain regions outside the hippocampus, where the spots were disorganized. Within the hippocampus, only the Cornu Ammonis 1 (CA1) and the Dentate Gyrus (DG) regions were identified. In contrast, Scanpy and stLearn identified three regions in the hippocampus: CA1, CA3, and DG, with clearer boundaries. SEDR, Seurat and EfNST achieved the best recognition, displaying clear, symmetrical brain regions and more accurate, less cluttered spot locations in the hippocampus. Seurat and SEDR were clear in recognizing the layering between different structures, but there are still some stray spots in certain areas that mix with spots from other regions. In contrast, the tissue regions identified by EfNST exhibit high symmetry without scattered spots between the structures. This is especially notable in mouse hippocampal tissue, where EfNST can identify fine regions that are difficult for other algorithms to recognize, such as Cornu Ammonis 2 (CA2), the periventricular zone (PZ), and visual area 3 (V3) (Fig. 4b).
Fig. 4. EfNST can recognize complex spatial domains in the CytAssist fresh frozen mouse brain.
a The identification results of spatial domains of six algorithms. b The visualization of the identification results of selected regions. c The annotation map of mouse hippocampus tissue and the visualization results of marker gene expression associated with selected regions.
The CA2 region was highly correlated with the expression of the marker gene Amigo249 and Map3k15 (Fig. 4c). Understanding that Amigo2 can influence cognitive functions such as learning and memory, Map3k15 expression affects social and spatial memory, we speculate that the CA2 region may play a crucial role in social memory and be involved in processing information related to social interactions. In addition, according to literature, Gpr151 and Nwd2 co-locate in the PZ region, and both genes can regulate neurotransmitter systems, such as dopamine, affecting reward-related behaviors. Therefore, we infer that the PZ region may be associated with the regulation of emotion, reward systems, and behaviors related to motivation. Finally, the identified Visual Area 3 (V3) region in the mouse brain, a visual cortex area, may be influenced by Enpp2, as validated by the accuracy of its position under this gene’s marker. The V3 region might participate in visual learning and memory functions in mice. In the annotation regions of the mouse hippocampus (https://atlas.brain-map.org/) (Supplementary Fig. 4), it is observed that the identified relevant regions align well with the original annotations, with high consistency between CA1, CA3, DG regions, and the expression of related marker genes50.
Overall, the clustering results of EfNST on mouse brain data demonstrated the advantages of EfNST in identifying spatial domains. EfNST not only accurately identified complex regions of tissues in data with ground truth but also accurately located fine tissues not identified by other methods on data without ground truth. This validated the crucial role of EfNST in deciphering spatial expression patterns and inferring biological functions in biological processes.
EfNST can identify all spatial domains of the mouse somatosensory cortex
To test the scalability of the EfNST algorithm on different sequencing platform, we further utilized the mouse somatosensory cortex data set from the osmFISH technology platform. With 5328 cells and 33 genes, the data set covers the tissue structure of 11 regions including the Hippocampus, Internal Capsule Caudoputamen, Layer 2-3 lateral, Layer 2-3 medial, Layer 3-4, Layer 4, Layer 5, Layer 6, Pia Layer 1, Ventricle and White Matter.
Data incompatibility hindered the utilization of the Seurat method, we selected the remaining 6 methods for comparative analysis (Supplementary Fig. 5a). The results revealed that Scanpy and conST successfully identified all tissue regions but showed poor clustering. Compared to the ground truth, only the White Matter layer was distinct, while the other layers appeared chaotic and indistinguishable. SEDR performed slightly better, revealing the White Matter and the ventricle layer more clearly. CCST provided satisfactory results, especially at the Pia Layer 1 and Hippocampus interface, though the boundaries between layers remained unclear. stLearn identified all regions accurately and showed clear stratification, especially for Layer 1, Internal Capsule Caudoputamen, and White Matter. SpaGCN exhibited poor identification results, with Layer 5 spots dispersed among other structural domains. Notably, EfNST outperformed other algorithms by accurately identifying all regions annotated in the ground truth, including some crucial layers (e.g., Pia Layer 1, Layer 4, and White Matter).
UMAP visualization provides a way to assess the performance of algorithms in capturing data structure. High-density clusters with clear boundaries indicate better performance, and comparing UMAP results from different algorithms highlights their relative effectiveness. As shown in Supplementary Fig. 5b, only the EfNST method clearly separates the 11 spatial domains, while other methods display disorganized groupings. Specifically, EfNST effectively distinguished clusters for White Matter, Layer 6, Layer 5, Layer 4, Pia Layer 1, and Internal Capsule Caudoputamen. All of the structures played fundamental roles in sensory perception, motor control and cognitive functions. Understanding their spatial distribution is essential for investigating neural system functions and abnormalities.
EfNST can accurately identify tissue layers in the mouse visual cortex
The mouse visual cortex data set from the STARmap platform encompasses rich and detailed tissue information, allowing us to evaluate the accuracy and reliability of EfNST. The data set covers a tissue region with 7 hierarchical structures containing the hippocampus (HPC), corpus callosum (CC), Layer 6 (L6), Layer 5 (L5), Layer 4 (L4), Layer 2/3 (L2/3) and Layer 1 (L1).
We compared the performance of EfNST with six other algorithms on the mouse visual cortex data set (Fig. 5). The results indicated that stLearn failed to recognize a clear hierarchical structure in the mouse visual cortex, with scattered spots from different layers mixed together and unclear boundaries (Fig. 5a). While Scanpy and conST also displayed many misclassifications and unclear layer boundaries, they correctly assigned spots to certain structures, particularly in L2/3. SEDR produced similarly scattered results, with the identified L1 layer being too large and inaccurate, and no clear boundaries between L1 and HPC. SpaGCN showed better performance but still had scattered spots across three areas. CCST identified more tissue structures and outperformed Scanpy, SEDR, stLearn, conST, and SpaGCN, but its results lacked clarity, especially in the boundaries between CC and L4. In contrast, EfNST clearly identified the seven-layer structure with well-defined layer boundaries, and the identified regions were more accurate compared to the ground truth.
Fig. 5. The EfNST algorithm can clearly identified entire layers of organizational structure in mouse visual cortex.
a The comparison between the results of EfNST and that of other six classic algorithms on mouse visual cortex. b The UMAP visualization between EfNST and other six algorithms.
Next, UMAP was used to visualize and analyze the embeddings of the seven methods. As shown in Fig. 5b, CCST and SEDR presented many scattered spots, making it difficult to effectively separate mapping clusters. While Scanpy and stLearn could differentiate the mapping clusters of the seven layers, the boundaries between these structures remained unclear, and the clusters appeared somewhat cluttered., However, EfNST demonstrated outstanding performance in identifying the seven-layered structures of the mouse visual cortex data set. It successfully differentiated the various layers, including the challenging hippocampal layer, corpus callosum, and the mapping clusters of L1 and L2/3.
Running speed and ablation studies
To evaluate the efficiency of EfNST, we compared its running speed with several other algorithms across various data sets, including Mouse Brain, Breast Cancer, and DLPFC (Table 1). This comparison highlights EfNST’s ability to significantly improve processing speed under constrained resources without compromising accuracy. These results demonstrated EfNST’s robustness in handling diverse data sets while maintaining its computational efficiency, further underscoring its potential for broader applications in spatial transcriptomics analysis.
Table 1.
The comparison of running speed(s) between EfNST and five classic algorithms on four data sets
| EfNST | stLearn22 | SEDR20 | Scanpy17 | Seurat18 | CCST21 | |
|---|---|---|---|---|---|---|
| Mouse Brain | 145 | 225 | 264 | 40 | 51 | 1800 |
| Breast Cancer | 67 | 341 | 491 | 20 | 72 | 5047 |
| DLPFC151507 | 71 | 840 | 735 | 17 | 88 | 1480 |
| DLPFC151670 | 60 | 502 | 736 | 30 | 74 | 1219 |
Table 1 presents the execution times (in seconds) of six algorithms using a i5 laptop configuration. The other two algorithms of SpaGCN and conST necessitate more advanced settings and are solely executable on a server cluster.
Moreover, to evaluate the key role of several critical components of EfNST (EfficientNet, VGAE, and DAE), a series of ablation studies were performed. For EfficientNet section, five variations of the EfNST model were developed, each utilizes a classic neural network architecture ResNet50, ResNet152, VGG19, DenseNet121, and Inception_v3 to replace EfficientNet to extract image features. The results indicated noticeable decreased in ARI values for the five variants of the EfNST model as compared to the original EfNST model (Supplementary Fig. 6). This further underline the considerable effectiveness of EfficientNet in identifying spatial domains. For VGAE and DAE section, we also conducted a similar ablation study to assess their impact on clustering performance, The results showed that clustering performance was significantly enhanced only when both VGAE and DAE were present (Supplementary Table 4 and Supplementary Fig. 6b).
Discussion
Currently, identifying spatial domains of tissue structure is crucial for mapping cellular heterogeneity, understanding tissue organization, and gaining insights into disease progression and tissue function. Here, we propose a spatial domain recognition algorithm called EfNST, which is based on a composite scaling network. EfNST facilitates multimodal integration of spatial transcriptomics data by extracting image features to accurately identify tissue structural domains. A key component of the algorithm is the use of the EfficientNet network to extract features from spatial transcriptomics images, improving accuracy while reducing parameter complexity and enhancing model speed under resource constraints. Additionally, the integration of VGAE and DAE further enhances the algorithm’s ability to handle potential features.
EfNST demonstrated strong performance across six data sets from three sequencing platforms, including large-scale (e.g., human breast cancer) and small-scale data sets (e.g., mouse visual cortex), showcasing its excellent data generalization and robustness. Compared with seven classic algorithms using four evaluation metrics, the EfNST algorithm achieved better results. In the data set of human Dorsolateral Prefrontal Cortex, the EfNST algorithm exhibits the better results on 12 slices, and identifies laminar structures more clearly with sharper boundaries. The results on data sets of human breast cancer and mouse brain reveal its ability to dissect spatial domains of complex tissues structure at a finer scale. It can also identify the areas that contrast algorithms fail to recognize. The highly consistent expression of relevant marker genes in literature validates the precise localization of unlabeled regions.
The superior performance of EfNST is largely attributed to its foundation on EfficientNet. EfficientNet’s strength lies in its innovative scaling method, rigorous experimental validation, optimized baseline design, and broad applicability. These factors collectively underscore the rationale and effectiveness of using EfficientNet for spatial domain recognition. Random seeds were introduced in the image processing process, and the results verified the stability of EfNST. Moreover, incorporating the Variational Graph Autoencoder (VGAE) for learning latent representations and the Denoising Autoencoder (DAE) for noise reduction has provided significant benefits. Related ablation experiments have further highlighted the critical role of these components.
Our work on the EfNST algorithm framework has achieved significant results in image feature extraction by incorporating the EfficientNet module to handle the large amount of image data in ST. It has not only improved model accuracy but also significantly enhanced efficiency. However, EfficientNet still has some limitations. For instance, its performance is dependent on the quality and quantity of training data, and the required computational resources increase as the model size grows (e.g., from EfficientNet-B0 to EfficientNet-B7). Furthermore, due to its complex network structure and nonlinear feature extraction capabilities, understanding and interpreting the model’s decision-making process becomes challenging. Additionally, EfNST is currently unable to be applied to multiple regions simultaneously and identify aligned spatial domains across tissues. Referring to the latest researches, on the basis of optimizing algorithms for individual slice, appropriately increasing multi-slice-related reference data from different platforms can enhance the recognition performance of the spatial domain on a single slice. In future work, we can use the EfNST method to learn shared latent spot embeddings after joint training on multiple slices. Alternatively, we can align H&E images from different slices to a common spatial reference, creating a unified neighborhood graph, and then utilize the EfNST method for comprehensive training and precise spatial domain recognition.
Methods
Data sets and preprocessing
To evaluate the clustering performance of EfNST, we used six data sets from three sequencing platforms of 10x Visium, osmFISH and STARmap. The three 10x Visium data sets includes the human Dorsolateral Prefrontal Cortex (DLPFC) (http://research.libd.org/spatialLIBD/), the human breast cancer, the mouse brain sagittal anterior and CytAssist fresh frozen mouse brain (https://www.10xgenomics.com/). Specifically, the DLPFC data set consists of 12 slices sampled from three experimental individuals. The number of spots per slice varies from 3498 to 4789. The data sets of human breast cancer and mouse brain sagittal anterior contains 2696 and 3798 spots, respectively. The data set of mouse somatosensory cortex from the osmFISH of subcellular level contains 5328 cells and 33 genes (http://linnarssonlab.org/osmFISH/). The data set of mouse primary visual cortex from the STARmap of subcellular level contains 1207 cells and 1020 genes (https://kangaroo-goby.squarespace.com/data). All of the data sets have benchmark truths to quantitatively evaluate the performance of the various algorithms. The mouse whole brain data set has 4298 spots with no baseline truth to verify the performance of different algorithms in recognizing unknown complex regions. In the data preprocessing stage for all data sets, we performed Quality Control, Log transformation and Normalization on the gene expression data, and then reduced the dimensionality of the gene expression profiling in the ST data set using Principal Component Analysis (PCA).
EfficientNet
EfficientNet is a principle-based way of augmenting convolutional networks. Unlike traditional methods that scale only one dimension, EfficientNet introduced an innovative compound scaling method that systematically balances the network’s depth, width, and resolution. Empirical evidence demonstrated that uniformly scaling all three dimensions ensured higher accuracy and efficiency within a given computational budget. Additionally, the baseline network of EfficientNet was designed through neural architecture search, optimizing both model accuracy and computational efficiency. Moreover, EfficientNet has shown outstanding performance in multiple transfer learning tasks, indicating its excellent generalization capabilities and suitability for various computer vision applications. In summary, the rationale for selecting the EfficientNet network structure lies in its innovative scaling method, successful experimental validation, optimized baseline design, and broad applicability. These factors collectively demonstrate the rationale and effectiveness of using EfficientNet for spatial domain recognition.
The workflow diagram of EfficientNet is shown in Supplementary Fig. 7. In the workflow diagram, Mobile Inverted Residual Bottleneck Convolution (MBConv)51,52 module is a specific building block used in convolutional neural network (CNN) architectures, which combines Depth-wise Separable Convolution and Residual Connection. MBConv contains key components such as Inverted Residual Block, Bottleneck Design, Activation and Regularization. It is designed to provide efficient yet effective feature extraction, particularly suited for mobile and embedded devices where computational resources are limited. As the basic module in EfficientNet, MBConv modules are scaled up to create a family of models optimized for both accuracy and efficiency across different scales and achieve state-of-the-art performance across different resource constraints. Firstly, the histological image is transformed through the first Conv3x3 layer to the input dimension required by the MBConv module. Then, the image goes through a series of MBConv modules to extract feature maps; the parameters of each MBConv module are finely adjusted to adapt to the current operating environment. A combined scale optimization method can enable the network to achieve a better receptive field. Next, an adaptive connection method based on the Fully Convolutional Neural Network feature map is utilized. A Conv1x1 network is used to adapt to feature maps of various sizes and unify them to the dimension required. Finally, the image’s classification recognition detection is completed through the output feature map. The corresponding mathematical formulas are as follows:
| 1 |
Where , and are coefficients used to scale the depth, width and resolution of the network. , , 〈Hi, Wi〉 and represent Operator, Layers, the resolution of the input and Channels, where i represents the Stage.
Tan and Le founded that the optimized , and were interdependent, so they proposed a new uniform scaling method for , and , which was a necessary requirement for achieving better accuracy and efficiency25. The method selects the composite coefficient ϕ scale the network width, depth and resolution uniformly in a principled manner. The corresponding mathematical formula are as follows:
| 2 |
where and are dynamic coefficients used to control depth, width and resolution, and ϕ is a constant used to control model complexity.
The series of models obtained after expanding the EfficientNet baseline network were called EfficientNets. Compared to previous ConvNets, EfficientNets have higher accuracy and efficiency. In comparison with other popular deep learning frameworks, such as ResNet28, Inception53, and MobileNet54, EfficientNets can achieve better computational performance. In addition, EfficientNet also show the best performance in image processing compared to other image classification benchmarks, such as ImageNet55 and CIFAR-10056.
At the same time, EfficientNets can better adapt to robust to data expansion and noise while improving accuracy and efficiency. Thus, it is more suitable for practical applications has in resource-constrained environments. Currently, the EfficientNets are popularly used in biomedical aspects, such as breast cancer detection57, retinal disease screening58, and brain tumor classification59.
Data augmentation
Data augmentation involves applying a series of transformations or operations to the original data to enhance its quality. These transformations are typically reversible and do not alter the labels of the samples. In this paper, we use spatial location information corresponding to gene expression data and histological image information to augment the gene expression of neighboring spots. With reference to the previous work60, the process aims to enhance the quality of the data set for subsequent analysis.
(1) The spatial gene expression weight values between spot and spot are calculated using the gene expression correlation, and the corresponding mathematical formula is as follows:
| 3 |
(2) The high-dimensional image features of each spot tile are extracted from a pre-trained efficientnet-b0 model. To better represent image features, 50 principal components after PCA dimension reduction were used as latent features. Finally, cosine similarity is used to calculate the image similarity weight between spots. The specific formula is as follows:
| 4 |
(3) Spatial adjacency mainly uses the spatial location information to determine the distance between each spot and all other spots, and we select the four (optional) nearest neighbors based on the distance from spatial coordinates and calculate the radius as the sum of their mean and variance. For a given spot , a spot can be viewed as one of the neighbors, if and only if the distance between spot and spot is less than the radius, , otherwise, .
Finally, the enhances gene expression of each spot is expanded to , and the formula for the 10x Visium platform is as follows:
| 5 |
The formula used for the platforms of osmFISH and STARmap is as follows:
| 6 |
Where and represent the original gene expression of spot and n adjacent sites .
Variational graph autoencoder
The VGAE26 was proposed by Kipf and Welling utilized latent variables to learn latent representations in graph-structured data based on variational autoencoder (VAE). According to the given unweighted undirected graph G and its adjacency matrix A, it uses the latent variable to reconstruct the Adjacency matrix:
| 7 |
X is the feature matrix of the nodes; A is the adjacency matrix:
| 8 |
where is the mean of the feature vector; is the variance of the node vector.
The defined two-layer neural network is:
| 9 |
is the symmetric normalized adjacency matrix.
The last-layer neural network generates μ and logσ2 share the second level parameter ;
After this, the decoder reconstructs the adjacency matrix using the inner product of the latent variables.
| 10 |
Among them .
The KL scatter is added to the loss function to improve the generalization ability and robustness of the model:
| 11 |
Where .
Denoising autoencoder
Denoising Autoencoder (DAE)27 is a powerful tool in the realm of unsupervised learning and data preprocessing. It is effective in learning useful representations and features from noisy data, which can improve the performance of downstream tasks such as classification or clustering. The augmented gene expression data in this paper includes various noises, such as background noise in histological image, dropout events of gene expression, spatial noise in sampling process. Thereby, we used a DAE to learn a nonlinear mapping from the integrated feature space to a low-dimensional representation space. By learning to remove noise from data, the model also learns more robust features and representations of the data.
In our work, the DAE is a variation of the traditional autoencoder which is a type of neural network designed to learn efficient representations of data, typically by compressing the input into a latent-space representation and then reconstructing the output from this representation. The encoder is composed of multiple fully connected stacked linear layers, which send the integrated gene expression X into the encoder. Each layer is linear and nonlinear activated to achieve the purpose of extracting deep features. The final coder transforms the gene expression data into a low-dimensional representation , with the specific formula as follows:
| 12 |
Where N is the total number of spots, G is the number of genes, and R is the dimension of the last-layer encoder. In contrast to this, decoder reconstructed the original input to obtain , expressed as:
| 13 |
Dimensionality reduction and clustering
As a nonlinear technique, uniform manifold approximation and projection (UMAP)61 maps high-dimensional data to a lower-dimensional space while preserving both local and global data structures62,63, revealing clustering patterns of gene expression at different sites. The clear boundaries and high-density clusters in UMAP visualizations indicate the effectiveness of this algorithm in capturing structural information in the data, thereby allowing for indirect comparison of different algorithms’ performance.
EfNST uses Leiden64 for clustering. Regarding the number of clusters, EfNST, and the algorithms compared in this paper follow the same strategy: For data sets with ground truth, we directly adjust the number of clusters to match the ground truth. For data sets without ground truth, we made a comparison at the same level of granularity.
Evaluation indicators
The Adjusted Rand Index (ARI)65, Adjusted Mutual Information (AMI)66, Fowlkes-Mallows Index (FMI)67, and Normalized Mutual Information (NMI)68 provide a more comprehensive and robust assessment of clustering performance than the metrics Specificity and Sensitivity69,70, which are used to assess the performance of a classification task, and are better suited to evaluate the degree of consistency between clustering results generated by different algorithms and the benchmark truth. Therefore, four clustering external evaluation metrics in the Scikit-learn package71 are used to assess the clustering performance of the algorithms, among which ARI is usually used as the main metric to measure the clustering ability in spatial domain identification because it integrates the completeness, consistency, and randomness of the clustering results.
The formula for the evaluation metrics is as follows:
| 14 |
Where TP, TN, FP, and FN represent true positive, true negative, false positive and false negative. The value range of ARI is [0,1], a higher ARI score indicates a stronger consistency between the clustering results and the ground truth.
| 15 |
MI denotes mutual information, H(U) and H(V) represent the entropy of the clustering result and the ground truth, respectively. Meanwhile, E[MI] denotes the expected mutual information with random assignment. A higher AMI score indicates a stronger clustering ability of the algorithm.
| 16 |
The variable , , and represent the ground truth, the result of clustering, the Cross Entropy, and the mutual information. The mutual information is calculated using the formula .
Fowlkes–Mallows Index (FMI) is commonly used to measure the similarity between clustering and the ground truth in clustering, its score range is [0,1].
| 17 |
Where , and respectively represent the number of true positives, the number of false positives and false negative. A higher FMI score indicates a good similarity between the clusters and the ground truth.
Ablation study
The EfNST algorithm utilizes the EfficientNet framework for image feature extraction, and integrate the gene expression information and the spatial location information to establish a comprehensive feature system. To evaluate the effectiveness of EfficientNet in identifying spatial domains, a series of ablation studies were performed on three main data sets of DLPFC, human breast cancer data sets and mouse brain with known ground truth. Five variations of the EfNST model were developed, each utilizes a classic neural network architecture ResNet50, ResNet152, VGG19, DenseNet121, and Inception_v3 to replace EfficientNet to extract image features. To ensure the fairness of the experimental results, all other parameters settings of model were kept consistent.
An ablation study on VGAE and DAE in the EfNST algorithm was conducted to assess their impact on clustering performance. The results showed that clustering performance is significantly enhanced only when both VGAE and DAE are present (Supplementary Fig. 6b).
Comparison of EfNST with other algorithms
For the detailed parameter settings of EfNST and other algorithms (Seurat, Scanpy, SEDR, conST, stLearn, CCST, SpaGCN, ADEPT72, BASS73, SpatialPCA74, GraphST75, STAGATE) used for the comparison can be found in the Supplementary Materials.
Statistics and reproducibility
Data processing and analysis were conducted with Python (version 3.9.16) along with the following libraries: Pandas (version 1.5.2), numpy (version 1.22.4), scipy (version 1.8.0), and torch (version 2.0.1). For data set preprocessing, we utilized scanpy (version 1.9.1), anndata (version 0.8.0), and matplotlib (version 3.6.3). The visual representations in our research were created using scanpy (version 1.9.1) and OriginPro software.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
The authors would like to express our sincere gratitude to the research team that developed DeepST (https://github.com/JiangBioLab/DeepST), which provided significant inspiration for our work. This work was financially supported by the Natural Science Foundation of Inner Mongolia of China (2024MS06027), the operation expenses basic scientific research of Inner Mongolia (JY20230067), the National Nature Scientific Foundation of China (62171241, 62461046), the key technology research program of Inner Mongolia Autonomous Region (2021GG0398), the Science and Technology Leading Talent Team in Inner Mongolia Autonomous Region (2022LJRC0009).
Author contributions
Z.L., Z.F., and Y.Z. (Yongchun Zuo) conceived and designed the study. Y.Z. (Yanan Zhao) and C.L. were responsible for data collection and preprocessing, developed the algorithm, performed all data analysis, conducted comparative evaluations, and contributed to the drafting and revision of the manuscript. W.S. and Z.S. participated in the data analysis and helped to revise the manuscript. All authors read and approved the manuscript.
Peer review
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Aylin Bircan and Mengtan Xing.
Data availability
This study utilizes six data sets from three sequencing platforms. The 10x Visium platform includes four data sets: the human dorsolateral prefrontal cortex (DLPFC) data set (http://research.libd.org/spatialLIBD/), along with data sets for human breast cancer, mouse brain sagittal anterior, and CytAssist fresh frozen mouse brain (https://www.10xgenomics.com/). The osmFISH platform provides the mouse somatosensory cortex data set (http://linnarssonlab.org/osmFISH/). The STARmap platform includes the mouse primary visual cortex data set (https://kangaroo-goby.squarespace.com/data).
Code availability
The code of this study is available at https://github.com/Zaoyanan/EfNST/ and 10.5281/zenodo.1405922976.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Yanan Zhao, Chunshen Long.
Contributor Information
Zhigang Liu, Email: whitexblack@163.com.
Zhenxing Feng, Email: zxfeng@imut.edu.cn.
Yongchun Zuo, Email: yczuo@imu.edu.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-024-07286-z.
References
- 1.Tellez-Gabriel, M., Ory, B., Lamoureux, F., Heymann, M.-F. & Heymann, D. Tumour heterogeneity: the key advantages of single-cell analysis. Int. J. Mol. Sci.17, 2142 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gupta, R. K. & Kuznicki, J. Biological and medical importance of cellular heterogeneity deciphered by single-cell RNA sequencing. Cells9, 1751 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Long, C. et al. Deciphering the decisive factors driving fate bifurcations in somatic cell reprogramming. Mol. Ther. -Nucleic. Acids34, 102044 (2023). [DOI] [PMC free article] [PubMed]
- 4.Li, H., Long, C., Hong, Y., Luo, L. & Zuo, Y. Characterizing cellular differentiation potency and Waddington landscape via energy indicator. Research6, 0118 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Li, H. et al. Dppa2/4 as a trigger of signaling pathways to promote zygote genome activation by binding to CG-rich region. Brief Bioinform. 10.1093/bib/bbaa342 (2021). [DOI] [PubMed]
- 6.Longo, S. K., Guo, M. G., Ji, A. L. & Khavari, P. A. Integrating single-cell and spatial transcriptomics to elucidate intercellular tissue dynamics. Nat. Rev. Genet.22, 627–644 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. & Regev, A. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol.33, 495–502 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yang, W. et al. Deciphering cell–cell communication at single-cell resolution for spatial transcriptomics with subgraph-based graph attention network. Nat. Commun.15, 7101 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Si, Z. et al. SpaNCMG: improving spatial domains identification of spatial transcriptomics using neighborhood-complementary mixed-view graph convolutional network. Brief. Bioinforma.25, bbae259 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353, 78–82 (2016). [DOI] [PubMed] [Google Scholar]
- 11.Rodriques, S. G. et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science363, 1463–1467 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell185, 1777–1792. e1721 (2022). [DOI] [PubMed] [Google Scholar]
- 13.Tian, L., Chen, F. & Macosko, E. Z. The expanding vistas of spatial transcriptomics. Nat. Biotechnol.41, 773–782 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhao, E. et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol.39, 1375–1384 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol.22, 1–31 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Yang, Y. et al. SC-MEB: spatial clustering with hidden Markov random field using empirical Bayes. Brief. Bioinforma.23, bbab466 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 1–5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Stuart, T. et al. Comprehensive integration of single-cell data. Cell177, 1888–1902.e1821 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Walker, B. L., Cang, Z., Ren, H., Bourgain-Chang, E. & Nie, Q. Deciphering tissue structure and function using spatial transcriptomics. Commun. Biol.5, 220 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Xu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med.16, 12 (2024). [DOI] [PMC free article] [PubMed]
- 21.Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci.2, 399–408 (2022). [DOI] [PubMed] [Google Scholar]
- 22.Pham, D. et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun.14, 7739 (2023). [DOI] [PMC free article] [PubMed]
- 23.Hu, J. et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods18, 1342–1351 (2021). [DOI] [PubMed] [Google Scholar]
- 24.Shan, Y. et al. TIST: transcriptome and histopathological image integrative analysis for spatial transcriptomics. Genomics Proteom. Bioinforma.20, 974–988 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tan, M. & Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. 97, 6105–6114 (2019).
- 26.Kipf, T. N. & Welling, M. Variational graph auto-encoders. Preprint at https://arxiv.org/abs/1611.07308 (2016).
- 27.Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning. 1096–1103 (2008).
- 28.He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778 (2016).
- 29.Simonyan, K. & Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations (2015).
- 30.Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708 (2016).
- 31.Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826 (2016).
- 32.Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.24, 425–436 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Pardo, B. et al. spatialLIBD: an R/Bioconductor package to visualize spatially-resolved transcriptomics data. BMC Genomics23, 434 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Liu, S.-Q. et al. Single-cell and spatially resolved analysis uncovers cell heterogeneity of breast cancer. J. Hematol. Oncol.15, 19 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Pal, B. et al. A single‐cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J.40, e107333 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Nakshatri, H. & Badve, S. FOXA1 in breast cancer. Expert Rev. Mol. Med.11, e8 (2009). [DOI] [PubMed] [Google Scholar]
- 37.Chen, C., Lu, J., Li, W. & Lu, X. Circular RNA ATP2C1 (has_circ_0005797) sponges miR-432/miR-335 to promote breast cancer progression through regulating CCND1 expression. Am. J. Cancer Res.13, 3433 (2023). [PMC free article] [PubMed] [Google Scholar]
- 38.Børresen‐Dale, A. L. TP53 and breast cancer. Hum. Mutat.21, 292–300 (2003). [DOI] [PubMed] [Google Scholar]
- 39.Hamy, A. et al. BIRC5 (survivin): a pejorative prognostic marker in stage II/III breast cancer with no response to neoadjuvant chemotherapy. Breast cancer Res. Treat.159, 499–511 (2016). [DOI] [PubMed] [Google Scholar]
- 40.Mehraj, U. et al. Cryptolepine targets TOP2A and inhibits tumor cell proliferation in breast cancer cells-an in vitro and in silico study. Anti-Cancer Agents Med. Chem.22, 3025–3037 (2022). [DOI] [PubMed] [Google Scholar]
- 41.Faldoni, F. L. et al. Inflammatory breast cancer: clinical implications of genomic alterations and mutational profiling. Cancers12, 2816 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Jung, N. & Kim, T.-K. Spatial transcriptomics in neuroscience. Exp. Mol. Med.55, 2105–2115 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cholia, P., Nayyar, R., Kumar, H. R. & K Mantha, A. Understanding the multifaceted role of ectonucleotide pyrophosphatase/phosphodiesterase 2 (ENPP2) and its altered behaviour in human diseases. Curr. Mol. Med.15, 932–943 (2015). [DOI] [PubMed]
- 44.Yamada, S., Furukawa, R. & Sakakibara, S.-i. Identification and expression profile of novel STAND gene Nwd2 in the mouse central nervous system. Gene Expr. Patterns46, 119284 (2022). [DOI] [PubMed] [Google Scholar]
- 45.Reshetnikov, V. V. et al. Genes associated with cognitive performance in the Morris water maze: an RNA-seq study. Sci. Rep.10, 22078 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Huggett, S. B. & Stallings, M. C. Cocaine’omics: genome‐wide and transcriptome‐wide analyses provide biological insight into cocaine use and dependence. Addiction Biol.25, e12719 (2020). [DOI] [PubMed] [Google Scholar]
- 47.Xia, L.-P. et al. GPR151 in nociceptors modulates neuropathic pain via regulating P2X3 function and microglial activation. Brain144, 3405–3420 (2021). [DOI] [PubMed] [Google Scholar]
- 48.Yang, J. et al. Wfs1 and related molecules as key candidate genes in the hippocampus of depression. Front. Genet.11, 589370 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Laeremans, A. et al. AMIGO2 mRNA expression in hippocampal CA2 and CA3a. Brain Struct. Funct.218, 123–130 (2013). [DOI] [PubMed] [Google Scholar]
- 50.Dong, H. W. The Allen Reference Atlas: A Digital Color Brain Atlas of the C57Bl/6J Male Mouse (John Wiley & Sons Inc, 2008).
- 51.Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520 (2018).
- 52.Tan, M. et al. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828 (2019).
- 53.Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9 (2015).
- 54.Howard, A. G. et al. Mobilenets: efficient convolutional neural networks for mobile vision applications. Preprint at https://arxiv.org/abs/1704.04861 (2017).
- 55.Deng, J. et al. In 2009IEEE Conference on Computer Vision and Pattern Recognition. 248–255 (IEEE, 2009).
- 56.Xu, B., Wang, N., Chen, T. & Li, M. Empirical evaluation of rectified activations in convolutional network. Preprint at https://arxiv.org/abs/1505.00853 (2015).
- 57.Wang, J., Liu, Q., Xie, H., Yang, Z. & Zhou, H. Boosted efficientnet: detection of lymph node metastases in breast cancer using convolutional neural networks. Cancers13, 661 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Zhu, S. et al. Screening of common retinal diseases using six-category models based on EfficientNet. Front. Med.9, 808402 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Nayak, D. R., Padhy, N., Mallick, P. K., Zymbler, M. & Kumar, S. Brain tumor classification using dense efficient-net. Axioms11, 34 (2022). [Google Scholar]
- 60.Xu, C. et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res.50, e131 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol.37, 38–44 (2019). [DOI] [PubMed] [Google Scholar]
- 62.Ding, Q. et al. Dimension reduction, cell clustering, and cell–cell communication inference for single-cell transcriptomics with DcjComm. Genome Biol.25, 241 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zheng, L. et al. EmAtlas: a comprehensive atlas for exploring spatiotemporal activation in mammalian embryogenesis. Nucleic Acids Res.51, D924–d932 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Hartigan, J. A. & Wong, M. A. Algorithm AS 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.)28, 100–108 (1979). [Google Scholar]
- 65.Steinley, D. Properties of the hubert-arable adjusted rand index. Psychol. Methods9, 386 (2004). [DOI] [PubMed] [Google Scholar]
- 66.Romano, S., Bailey, J., Nguyen, V. & Verspoor, K. Standardized mutual information for clustering comparisons: one step further in adjustment for chance. In International Conference on Machine Learning. 1143–1151 (2014).
- 67.Hubert, L. & Arabie, P. Comparing partitions. J. Classif.2, 193–218 (1985). [Google Scholar]
- 68.Rendón, E., Abundez, I., Arizmendi, A. & Quiroz, E. M. Internal versus external cluster validation indexes. Int. J. Comput. Commun.5, 27–34 (2011). [Google Scholar]
- 69.Sokolova, M. & Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag.45, 427–437 (2009). [Google Scholar]
- 70.Hong, Y. et al. An increment of diversity method for cell state trajectory inference of time-series scRNA-seq data. Fundamental Res.10.1016/j.fmre.2024.01.020 (2024). [DOI] [PMC free article] [PubMed]
- 71.Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011). [Google Scholar]
- 72.Hu, Y. et al. ADEPT: Autoencoder with differentially expressed genes and imputation for robust spatial transcriptomics clustering. iScience26, 106792 (2023). [DOI] [PMC free article] [PubMed]
- 73.Li, Z. & Zhou, X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol.23, 168 (2022). [DOI] [PMC free article] [PubMed]
- 74.Shang, L. & Zhou, X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 13, 7203 (2022). [DOI] [PMC free article] [PubMed]
- 75.Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 14, 1155 (2023). [DOI] [PMC free article] [PubMed]
- 76.Zhao, Y. et al. Source code for “A composite scaling network of EfficientNet for improving spatial domain identification performance”. Zenodo10.5281/zenodo.14059229 (2024). [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This study utilizes six data sets from three sequencing platforms. The 10x Visium platform includes four data sets: the human dorsolateral prefrontal cortex (DLPFC) data set (http://research.libd.org/spatialLIBD/), along with data sets for human breast cancer, mouse brain sagittal anterior, and CytAssist fresh frozen mouse brain (https://www.10xgenomics.com/). The osmFISH platform provides the mouse somatosensory cortex data set (http://linnarssonlab.org/osmFISH/). The STARmap platform includes the mouse primary visual cortex data set (https://kangaroo-goby.squarespace.com/data).
The code of this study is available at https://github.com/Zaoyanan/EfNST/ and 10.5281/zenodo.1405922976.





