Abstract
Three-dimensional Spatial Transcriptomics has revolutionized our understanding of tissue regionalization, organogenesis, and development. However, existing approaches overlook either spatial information or experiment-induced distortions, leading to significant discrepancies between reconstruction results and in vivo cell locations, causing unreliable downstream analysis. To address these challenges, we propose ST-GEARS (Spatial Transcriptomics GEospatial profile recovery system through AnchoRS). By employing innovative Distributive Constraints into the Optimization scheme, ST-GEARS retrieves anchors with exceeding precision that connect closest spots across sections in vivo. Guided by the anchors, it first rigidly aligns sections, next solves and denoises Elastic Fields to counteract distortions. Through mathematically proved Bi-sectional Fields Application, it eventually recovers the original spatial profile. Studying ST-GEARS across number of sections, sectional distances and sequencing platforms, we observed its outstanding performance on tissue, cell, and gene levels. ST-GEARS provides precise and well-explainable ‘gears’ between in vivo situations and in vitro analysis, powerfully fueling potential of biological discoveries.
Subject terms: Computational models, Software, Transcriptomics, Bioinformatics, Data processing
Existing 3D Spatial Transcriptomics reconstruction approaches often overlook spatial information or experiment-induced distortions. Here, authors propose ST-GEARS to bridge the gap between in vivo cell locations and in vitro analysis, accurately recovering spatial profiles.
Introduction
Spatial transcriptomics (ST) is an omics technology that fuels biological research based on measuring gene expression on each position-recorded spot across sliced tissues1–3. Notably, a range of methods has been developed. In vivo sequencing (ISS)4 platforms such as Barcoded Anatomy Resolved by Sequencing (BARseq)5 and Spatially-resolved Transcript Amplicon Readout Mapping (STARmap)6 rely on amplification, hybridization and imaging process to capture gene expression information. Next Generation Sequencing (NGS)7 platform such as Visium1, Stereo-seq8 and Slide-Seq29 uses spatial barcoding and capturing in their implementations. These methods offer various sequencing resolutions ranging from 100 µm10,11 to 500 nm8, and can measure thousands5 to tens of thousands8 of genes simultaneously.
Single-slice ST studies have unleashed discoveries, and facilitated our understanding in diverse biological and medical fields9,12–15. Consequently, numerous processing pipelines and analysis models have been developed for ST data on a single section16–21. However, to truly capture transcriptomics in the real-world context, three-dimensional (3D) ST was designed to recover biological states and processes in real-world dimensions, without restriction of the isolated planes in single sectional ST studies. Various research has utilized the power of 3D ST to uncover insights in homeostasis, development, and diseases. Among them, Wang et al. 22 uncovered spatial cell state dynamics of Drosophila larval testis and revealed potential regulons of transcription factors. Mohenska et al. 23 revealed complex spatial patterns in Murine heart and identified novel markers for cardiac subsections. And Vickovic et al. 24 explored cell type localizations in Human rheumatoid arthritis synovium. The vast and large variety of downstream 3D research has posted the need for a reliable and automatic recovery method of in vivo spatial profile.
However, the collection process of ST data casts significant challenges onto the accurate reconstruction of 3D ST and the situation has not been overcome by current explorations. Specifically, in 3D ST experiments, individual slices are cross sectioned in a consistent direction, then manually placed on different chips or slides14,25. This operation introduces varying geospatial reference systems of distinct sections, and coordinates are distorted compared to their in vivo states. The distortions occur due to squeezing and stretching effects during the picking, holding, and relocation of the sections. Such different geospatial systems and distortions complicates the recovery of in vivo 3D profile. Among current recovery approaches, STUtility26 realizes multi-section alignment through the registration of histology images, without considering either geospatial or molecular profile of mRNA, which leads to compromised accuracies. Recently published method PASTE27, and its second version PASTE228 achieve alignment using both gene expression and coordinate information, through optimization of mapping between individual spots across sections. These methods cause inaccurate mappings and produces rotational misalignments due to the nonadaptive regularization factors, and their uniform sum of probability assigned to all spots upon presence of spots without actual anchors. All above approaches only consider rigid alignment, yet neglect the correction of shape distortions, resulting in shape inconsistency across registered sections. Published method Gaussian Process Spatial Alignment (GPSA)29 considers shape distortions in its alignment, yet it doesn’t involve structural consistency in its loss function, which can cause the model to overfit to local gene expression similarities, leading to mistaken distortions of spatial information. Moreover, its hypothesis space involves readout prediction in addition to coordinates alignment, causing uncertainty in direction of gradient descent, and vulnerabilities to input noises. Another alignment approach, Spatial-linked alignment tool (SLAT)30 also focuses on anchors construction between sections, yet it doesn’t provide a methodology to construct 3D transcriptomics profile. Other tools focus on analysis and visualization of 3d data, such as Spateo31, VT3D32 and StereoPy33.
To address these limitations, we introduce ST-GEARS, a 3D geospatial profile recovery approach designed for ST experiments. By formulating the problem using the framework of Fused Gromov-Wasserstein (FGW) Optimal Transport (OT)34, ST-GEARS incorporates both gene expression and structural similarity into the Optimization process to retrieve cross-sectional mappings of spots with the same in vivo planar positions, also referred to as ‘anchors’. During this process, we introduce innovative Distributive Constraints that allow for different emphasis on distinct spot groups. The strategy addresses importance of expression consistent groups and suppresses inconsistent groups from imposing disturbances to optimization. Hence it increases anchor accuracy compared to current approaches. ST-GEARS utilizes the retrieved anchors to initially perform rigid alignment of sections. Subsequently, it introduces Elastic Field guided by the anchors to represent the deformation and knowledge to correct it according to each spot’s location. To enhance the quality of the field, Gaussian Smoothing is applied for denoising purposes. ST-GEARS then applies Bi-sectional Application to correction of each section’s spatial profile based on its denoised fields calculated with its neighboring sections. With validity proved mathematically, Bi-sectional Application eliminates distortions of sections, resulting in the successful recovery of a 3D in vivo spatial profile.
To understand effects of ST-GEARS, we first studied its counterparts with innovations including anchors retrieval and elastic registration, respectively on Human dorsolateral prefrontal cortex (DLPFC)35, and Drosophila larva22. We found an advanced anchors accuracy of ST-GEARS compared to other available methods involving anchor’s concept and unveiled Distributive Constraints as reason behind the advancement. We validated the effectiveness of elastic registration process of ST-GEARS on both tissue shape smoothness and cross-sectional consistency. Then, we studied output of ST-GEARS and other methods on their reconstruction of Mouse hippocampus tissues36, Drosophila embryo individual22 and a complete Mouse brain37. The result was studied on morphological, cell and gene levels. ST-GEARS was found to be the only method that correctly reconstruct on all cases despite of cross-sectioning distance, number of sections, and sequencing platforms, and it was found to output the most accurate spatial information under both annotation type or clustering information, and hybridization evidence.
Results
ST-GEARS algorithm
ST-GEARS uses ST data as its inputs, including mRNA expression, spatial coordinates as well as approximate grouping information such as clustering or annotation of each observation. Then it recovers 3D geospatial profile in following steps (Fig. 1).
(1) Optimization problem formulation under scheme of FGW OT with enhancement of Distributive Constraints. FGW OT formulation is established to enable solving of ‘anchors’, which are the joining of pair of spots with same in vivo planar positions. Noticeably, each solved anchor is equipped with a probability that describes its strength of connection, and each spot is solved to have zero to multiple anchors. Among each two sections, section-specific groups of spots, and genes are initially excluded from the formulation to avoid causing disturbances to anchors computing. Considering that connected spots are more spatially approximate, and more similar in gene expression because of shared cell identity38,39, FGW was adopted to combine the gene expression and structural terms in optimization, enabling highest gene expression similarity between mapped spots, at the same time keeping similar spot positions relative to their sections. Moreover, an innovative Distributive Constraints setting is designed and integrated into FGW OT’s formulation, to assign higher emphasis on spots or cells whose annotation or cluster express high similarity across section, and vice versa. Distributive Constraints leads registration to rely more on expression-consistent regions of sections, hence largely enhancing both accuracy of anchors and precision of following rigid and elastic registration.
(2) Optimization problem solving utilizing self-adaptive regularization and conditional gradient descent. Our designed Self-adaptive Regularization strategy automatically determines the relative importance between gene expression and structural terms in the optimization problem. This strategy leads to an optimal regularization factor across different section distances, spot sizes, extent of distortions, and data quality such as level of diffusion. Conditional Gradient34 is adopted as optimizer, which updates anchors iteratively towards higher expression and structural similarity with each iteration. The efficacy of Conditional Gradient has been demonstrated through its convergence to a local optimal point40, thereby ensuring the robustness and effectiveness of our approach.
(3) Rigid registration by Procrustes Analysis41. After filtering out anchors with relatively low probabilities, the optimal transformation and rotation of each section are analytically solved through Procrustes Analysis, which minimizes summed spatial distances of spots anchored to each other. With the transformation and rotation applied, sections are positionally aligned.
(4) Elastic registration guided by anchors. Based on rigid registration result and anchors solved by FGW OT, elastic registration is implemented through the process including elastic field inference, 2D Gaussian denoising, and bi-sectional fields application. Based on each rigidly registered section, elastic fields is inferred leveraging the location difference between its own spots and its anchored spots on anterior and posterior neighbor sections. An elastic field is a 2D displacement distribution, describing how displacement values are distributed across different locations. Making use of continuity of deformation at local scales, 2D Gaussian Denoising convolutes all over the fields to reduce noises. With denoised fields, our designed Bi-sectional Fields Application corrects each section’s deformation according to its fields calculated with anterior and posterior neighbor sections. The bi-sectional correction method is mathematically proved to approximately recover each section’s spatial profile to its original state.
Enhancement of anchor retrieval accuracy through distributive constraints
As was unfolded, ST-GEARS is an algorithm flow jointly constituted of probabilistic anchor computation and spatial information recovery. Hence, to validate the effectiveness of our method and demonstrate its underlying design philosophy, we conducted comprehensive studies on the two counterparts using real-world data. To begin, we utilized the DLPFC dataset35 to study our anchors retrieving accuracy with emphasis on the effect of Distributive Constraints design.
To assess the effects of Distributive Constraints on anchor accuracy, we compared ST-GEARS with and without this setting, and with other constraints involving methods including PASTE, PASTE2 and SLAT. We investigated constraint values assigned by these methods, as well as their solved number of anchors and maximum anchor probability of each spot. Furthermore, we examined the annotation types that were considered connected based on the computed anchors to assess accuracy of anchors. Among the methods we compared, ST-GEARS with Distributive Constraints was found to assign different constraint values to spots within different neuron layers, while the others assigned uniform constraints to all layers (Fig. 2a, Supplementary Fig. 1). The results of ST-GEARS showed that both number of anchors and the anchors’ maximum probabilities for each spot were lower in Layer 2 and Layer 4 compared to the thicker layers. However, this pattern was not observed in methods without Distributive Constraints setting (Fig. 2a, Supplementary Fig. 1). To illustrate the impact of this strategy on anchor accuracy, we tagged each spot with annotation of its connected spot by anchor with highest probability. We then compared this result to the tagged spot’s original annotation (Fig. 2a, Supplementary Fig. 1). Under Distributive Constraints, ST-GEARS achieved a significantly higher proximity between annotations compared to PASTE and our method without Distributive Constraints. PASTE2 also led to approximate annotations, but it anchored multiple spots to spots from neighboring layers, particularly those near layer boundaries. SLAT also mapped multiple spots to spots from different tissue layers, particularly of spots located on layer 2, 4 and 6.
To evaluate the precision of anchors, we conducted a comparison with the Mapping accuracy index introduced by PASTE27. This index measures the weighted percentage of anchors that connect spots with same annotation. As a result, ST-GEARS outperformed PASTE2 and SLAT, and reached a score that was over 0.5 (out of 1) higher than both PASTE and our method without Distributive Constraints (Fig. 2a, b, Supplementary Fig. 1).
To uncover the reasons behind the aforementioned phenomena, as the functional area in between thicker neocortical layers, thinner neocortical layers have comparable transcriptomic similarity with their adjacent layers in gene expression, than with its own annotation type1,35. This implies that, in contrast to thicker layers, thinner layers tend to introduce more disturbances during anchor computation. However, the Distributive Constraints imposed suppression on these annotation types by assigning a smaller sum of probability to each of their spots. The suppression was reflected in above results where each spot in Layer 2 and Layer 4 has fewer assigned anchors and a lower maximum probability (Fig. 2a, Supplementary Fig. 1). Further analysis on all spots in the DLPFC reveals that a certain percentage of spots were suppressed in anchor generation due to the Distributive Constraints (Fig. 2c, Supplementary Fig. 2).
Recovery of in vivo shape profile through elastic registration
We then utilized Drosophila larva data to investigate the spatial profile recovery effect of ST-GEARS, with an emphasis on our innovated elastic registration. We first applied rigid registration to Drosophila larva sections and observed a visually aligned configuration of individual sections (Supplementary Fig. 3). By further mapping cell annotations back to their previous sections, according to the strongest anchors of each spot, the projected annotations are visually in match with original ones (Supplementary Fig. 4). The accuracy of the mapping matching between annotations was quantified by Mapping accuracy (Supplementary Fig. 5). The above findings validated that ST-GEARS produced reliable anchors and accurately aligned sections through rigid registration. However, when stacking the sections together, we observed an inconsistency on the edge of lateral cross-section of the rigid result (Supplementary Fig. 6). This inconsistency doesn’t conform to the knowledge of intra-tissue and overall structural continuity of Drosophila larvae.
After applying elastic registration to the rigidly-aligned larva, we observed a notable improvement in the continuity of the cross section above, indicating a closer-to-real spatial information being retrieved. To further understand the effect of elastic operation on the dataset, we compared the changes in area of the complete body and three individual tissues (trachea, central nervous system (CNS), and fat body) on all sections. We observed an enhanced smoothness in the curves of elastically registered sections, which aligns with the continuous morphology of the larva as expected by theoretical knowledge. To quantify the smoothing effect, we calculated Scale-independent Standard Deviation of Differences () onto the curves, which measures the smoothness of area changes along the sectioning direction (Fig. 3a and Methods). A decrease of SI-STD-DI on all tissues and the body provided empirical evidence for the improved smoothness. To further investigate the recovery of internal structures, we introduced Mean Structural Similarity (MSSIM). MSSIM takes structurally consistent sections as input, and measures pairwise internal similarity of reconstructed result using annotations or clustering information (Supplementary Fig. 7). (See Methods for details). An improved MSSIM was noticed on all 4 sections, indicating that elastic registration further recovers internal geospatial continuity on basis of rigid operation(Fig. 3b). By comparing registration effect of individual sections, we also observed that the elastic process successfully rectified a bending flaw along the edge of the third section, (Fig. 3c). The shape fixing highlighted that ST-GEARS not only yielded a more structurally consistent 3D volume, but also provided a more accurate morphology for single sections. The improved smoothness, the recovered structural continuity, and the shape fixing collectively demonstrate that elastic registration effectively recovers geospatial profile.
With elastic process validated and applied onto rigid registration result, the recovery of spatial information was completed. Stacking individual sections of the elastic result, a complete geospatial profile of the larva was generated (Supplementary Fig. 8), visualizing the ST-GEARS’ ability of in vivo spatial information recovery.
Application to sagittal sections of Mouse hippocampus
After validating the component phases of ST-GEARS, we proceeded to apply the method to multiple real-world problems to recover geospatial profiles. We first focused on two sagittal sections of Mouse hippocampus36 (Supplementary Fig. 9) that were 10 μm apart, accounting for 1–2 layers of Cornu Ammonis (CA) 1 neurons42. Considering the proximity of these sections, we assumed no structural differences between them.
To compare the differences of registration effect among methods, we extracted CA fields and dentate gyrus (DG) beads (Supplementary Fig. 10), then stacked the two sections for a more obvious contrast (Fig. 4a). PASTE2 failed in performing the registration, leaving the sections unaligned. By GPSA, the sections’ positions were aligned, yet the 2nd section were squeezed into a narrower region than first one, leading to a contradiction of region’s location. The ‘narrowing’ phenomena may be caused by the overfitting of GPSA model on gene expression similarity, since it doesn’t involve structural similarity between registered sections in loss function. The scale on horizontal and vertical axis was distorted due to the equal scale range strategy adopted in GPSA’s preprocessing. STalign also misaligned the sections, leaving an obvious angle between two slices in registration result. This may be due to the method’s processing of ST data into images which completely relies on gene expression abundance to decide pixel intensities. On the sagittal section of Mouse hippocampus, the abundance difference between regions may not provide sufficient structural information required by registration. In the comparison between PASTE and ST-GEARS, our method demonstrates a more accurate centerline overlapping of CA fields and DG compared to PASTE. This indicated an enhanced recovery of spatial structure consistency and an improved registration effect. To quantitatively evaluate these findings, we utilized the MSSIM index as a measure of structural consistency and compared it among PASTE, PASTE2, GPSA, STalign and ST-GEARS (Fig. 4b). Consistent with the results of centerline, ST-GEARS achieved a higher MSSIM score than GPSA and PASTE, surpassing PASTE2 and STalign by >0.2 out of 1. By comparing memory efficiency across all methods, ST-GEARS and PASTE used ~1 GB less memory than PASTE2, GPSA and STalign, and the peak memory across ST-GEARS and PASTE was almost the same (Supplementary Fig. 11). In perspective of time efficiency, registration utilizing ST-GEARS, STalign, GPSA and PASTE was much faster than PASTE2.
To understand reasons behind our enhancement, we thoroughly examined the anchors generated by PASTE, PASTE2 and ST-GEARS, as well as the effects of our elastic registration. By mapping cluster information of the 2nd section to the 1st, and the 1st to the 2nd through anchors, we found correspondences between the projected and original annotations (Supplementary Fig. 12). Accordingly, our Mapping accuracy was over 0.25 higher than PASTE and over 0.45 than PASTE2 (Fig. 4a), indicating our exceptional anchor accuracy. To understand and further substantiate this advantage, we visualized the probabilistic constraints and its resulted anchors probabilities (Supplementary Fig. 13a). It is worth noting that ST-GEARS implemented Distributive Constraints, in contrast to the uniform distributions used by PASTE. As a result, a certain percentage of spots were found to be suppressed in anchors connection by ST-GEARS (Supplementary Fig. 13b) compared to PASTE, leaving the registration to rely more on spots with higher cross-sectional similarity and less computational disturbances, and hence lead to a higher anchor accuracy. We excluded Distributive Constraints from ST-GEARS, and noticed an obvious decrease of mapping accuracy on the hippocampus dataset (Supplementary Fig. 14), indicating the contribution of Distributive Constraints on anchors accuracy. In the study of elastic effect, we found an increased overlapping of centerlines by elastic registration than by rigid operation only when overlapping CA fields and DG (Fig. 4b). Quantitively by MSSIM, the cross-sectional similarity was found to be increased by elastic registration (Supplementary Fig. 15). These findings suggest that the combination of Distributive Constraints and elastic process contributed to the enhanced registration of the Mouse hippocampus.
To explore the potential effect of impact of our registration on downstream analysis, we extracted region-specific annotation types from the sections, and analyzed their overlapping through stacking registered sections together (Fig. 4c). In all annotation types including DG, Neurogenesis, subiculum, CA1, CA2 and CA3, the distribution regions from both sections were nearly identical. The overlapping result unveils that ST-GEARS integrated the spatial profile of same cell subpopulations, enabling a convenient and accurate downstream analysis of multiple sections.
Application to 3D reconstruction of Drosophila embryo
Besides tissue level registration of Mouse hippocampus, to evaluate the performance of ST-GEARS in reconstructing individual with multiple sections, we further tested it on a Drosophila embryo. The transcriptomics of embryo was measured by Stereo-seq, with 7 μm cross-sectioning distance22. By quantifying the registration effect of spatial information recovery and comparing it to PASTE, PASTE2, GPSA and STalign, we found that ST-GEARS achieved the highest MSSIM in five out of the six structurally consistent pairs (Fig. 5a). On the pair where ST-GEARS did not result in highest MSSIM, it surpassed PASTE, and achieved a similar score to PASTE2. By comparing area changes with SI-STD-DI quantification of the complete section, and three individual tissues including epidermis, midgut and foregut, ST-GEARS yielded higher smoothness on all regions than all other approaches, both visually and quantitatively (Fig. 5b).
To compare the reconstruction effect, we studied both registered individual section, and reconstructed 3D volume. Among the methods compared, PASTE produced a wrong flipping on the 15th section along A-P axis (Fig. 5c). Stacking sections back to 3D and investigating on dorsal view, the wrong flipping caused a false regionalization of foregut circled in orange (Fig. 5d). Along the first to last section registered by PASTE2, a gradual rotation was witnessed (Fig. 5c), leading to over 20 degrees of angular misalignment between the first and the last section. Similar to PASTE, this misalignment also caused the wrong regionalization of foregut in 3D map (Fig. 4d). Equally induced by the rotation, sections were found to extrude in the 3D result circled in blue, breaking the round overall morphology of the embryo. GPSA caused false distortion of 8 out of 16 sections as pointed by purple arrows (Fig. 5c) and the stacked sections formed a dorsal view of an isolated circle and an inner region (Fig. 5d). The phenomena may be due to its overfitting onto expressions, which is caused by the contradiction between its hypothesis of consistent readout across sections, and the large readout variation across 16 sections in this application. Similar to PASTE, STalign also produced a wrong flipping, on the 13th section along A-P axis (Fig. 5c). Stacking the projections back to 3D, a mistaken regionalization of foregut, caused by the wrong flipping, was circled in orange (Fig. 5d). In contrast, ST-GEARS avoided all of these mistakes in its results (Fig. 5c). From the perspective of individual section profiles, noticeably in the 15th section, we observed a significant reduction in the dissecting region between two parallel lines, indicating the successful fixation of flaws in the session. By comparing time usage across all methods, ST-GEARS achieved the 2nd lowest time consumption in registration (Supplementary Fig. 11). In terms of memory consumption, ST-GEARS, PASTE and STalign used much less memory than PASTE2 and GPSA. The three most memory efficient methods used almost identity peak memory, with the value fluctuation of <7%.
To comprehend the rationale behind our improvement, we analyzed the anchors generated by the three methods and the impact of our elastic registration. In the investigation of anchor accuracy, we discovered that ST-GEARS achieves the highest mapping accuracy among all section pairs (Fig. 5e), suggesting its advanced ability to generate precise anchors, which forms the basis for precise spatial profile recovery. To understand this advancement, probabilistic constraints and its resulted anchors distributions (Supplementary Fig. 16, Supplementary Fig. 17) were studied. With Distributive Constraints (Supplementary Fig. 16a), ST-GEARS generated different maximum probabilities on different annotation types (Supplementary Fig. 16b), which indicates that annotation types with higher cross-sectional consistency were prioritized in anchor generation. This selection led to reduced computational disturbances, and hence higher accuracy of anchors. We also compared anchor accuracy with and without Distributive Constraints adopted, and noticed an increase of mapping accuracy on each pair of sections (Supplementary Fig. 18). In final registration result, ST-GEARS without Distributive Constraints failed to fix the experimental flaw on the 15th section (Supplementary Fig. 19), in contrast to effect upon the setting adopted (Fig. 5c). Above findings validate the contributive effect of Distributive Constraints in our method. In study of elastic registration in shape smoothness, we witnessed an increased level of smoothness of tissue epidermis, foregut, and midgut, as well as the complete section, through area changes quantified by SI-STD-DI index (Supplementary Fig. 20). In internal structure aspect, an increased MSSIM of structural consistent pairs were noticed (Supplementary Fig. 21). An experimental flaw on the 15th section was also fixed by elastic registration (Supplementary Fig. 22). Above findings point that the enhancement of registration accuracy on Drosophila embryo was induced by Distributive Constraints and elastic process.
By mapping spots back to 3D space, we further investigated the effect of different method on downstream analysis, in the perspective of genes expression (Fig. 5f). Cpr56F and Osi7 were selected as marker genes, which were found to respectively highly express in foregut, and foregut plus epidermis region22. Investigating Cpr56F expression by ST-GEARS from dorsal view, we noticed three highly expressing regions, at anterior end, front region, and posterior end of the embryo. The finding matches the hybridization result of stage 13-16 Drosophila embryo extracted from Berkeley Drosophila Genome Project (BDGP) database. In contrast, none of PASTE, PASTE2, GPSA and STalign presented high expression at all three locations. When analyzing the distribution of Osi7 by PASTE, PASTE2 and STalign, we noticed a sharp decrease in expression from inner region to the outer layer marked by purple arrows, contradicting the prior knowledge of high expression in the epidermis. This is probably because PASTE and PASTE2 do not consider distortion correction as part of their methods, leaving section edges un-coincided and marker genes not obviously highly expressed on the outermost region. Though involving distortion correction, STalign lost certain amount of structural information by transforming ST data to image utilizing only information of regional gene expression abundance. The registration did not adequately correct distortion without support of enough structural messages. Similarly, PASTE2 failed to capture expression in outer layers and instead revealed a high expression in one inter-connected area, which did not correspond to the separate expression regions observed in hybridization result. No spatial pattern was witnessed when analyzing distribution of Osi7 by GPSA, which forms an obvious contrast to its hybridization evidence. Comparably, none of the violations was shown in the result of ST-GEARS. The comparison of spatial distribution indicated our potential capability to better enhance the process of downstream gene-related analysis.
Application to Mouse brain reconstruction
The design of 3D experiments involves various levels of sectioning distances22,36,37. To further investigate the applicability of ST-GEARS on ST data with larger slice intervals, we applied the method to a complete Mouse brain hemisphere dataset, which consists of 40 coronal sections (Supplementary Fig. 23a), with a sectioning distance of 200 μm37. The transcriptomics data was measured by BARseq, which includes sequencing data and its cross-modal histology images. Each observation represents captured transcriptomics surrounded by the boundary of a cell.
Through respectively applying PASTE, PASTE2, GPSA, STalign and ST-GEARS onto the dataset, we observed multiple misaligned sections produced by approaches including PASTE, PASTE2, GPSA and STalign (Supplementary Fig. 23b, Supplementary Fig. 23c, Supplementary Fig. 23d, Fig. 6a). In PASTE, these misalignments include 2 sections with ~ 180° angular misalignment (Supplementary Fig. 23b). By PASTE2, 4 rotational misalignments and 8 positional misalignments were noticed (Supplementary Fig. 23d). By GPSA, 12 sections were observed to be rotationally misaligned, and 3 sections were mistakenly distorted (Supplementary Fig. 23b), probably due to its overfitting onto expressions discussed in analysis of Drosophila embryo. The scale on horizontal and vertical axis was distorted maybe due to the similar reason analyzed in Mouse hippocampus. And by STalign, 7 rotational misalignments were generated (Supplementary Fig. 23e). As a clear contrast, our algorithm correctly aligned all 40 sections with 200 μm intervals (Supplementary Fig. 23f). To more accurately assess the result of our registration, we employed the direction of the cutting lines induced during tissue processing37, and compared the consistency of tilt angles of these lines in the 20th, 25th, 26th, 27th, 33rd, 34th and 37th slices where these lines are visible. Notably, neither visual angle differences nor cutting line curving were observed, indicating that the sections were properly aligned by ST-GEARS (Fig. 6a, Supplementary Fig. 23f). To quantify the registration accuracy in aspect of structural continuity, we calculated MSSIM scores of 11 section pairs that are structural consistent (Fig. 6b). Consistent with the visual observations, PASTE2 presented a much larger score range than other methods, which reflects its instability across sections in this dataset, and GPSA exhibited the lowest median MSSIM score indicating its suboptimal average performance. By comparison, PASTE yielded a higher median score and a smaller variation, while ST-GEARS resulted in the highest median score and the smallest variation among all methods. In terms of computational efficiency, ST-GEARS achieved the 2nd lowest time consumption and lowest peak memory consumption across all methods (Supplementary Fig. 11).
To understand the reasons behind our progress, we examined anchor accuracy changes with regularization factors during ST-GEARS computation (Supplementary Fig. 24). Out of 39 section pairs, we observed a change in mapping accuracy >0.1 (out of 1) in 12 pairs. By Self-adaptive Regularization which was designed to face varying data characteristics which also includes varying section distances, regularization factor that leads to optimal mapping accuracy was selected, leading to an increased anchors accuracy in the 12 section pairs. Notably, among these 12 pairs, pairs 29th & 30th, 31st & 32nd and 32nd & 33rd were correctly aligned by ST-GEARS but misaligned by PASTE, which doesn’t adopt any self-adaptive regularization strategy.
After validating the registration result, we investigated the recovered cell-types’ distribution in the 3D space to assess the effectiveness of the reconstruction and its impact on further analysis. We observed that the complete morphology of hemisphere was recovered by ST-GEARS, with clear distinction of different tissues on perspective, lateral and anterior views (Fig. 6c). We further studied the distribution of separate annotation types within cortex layers and found that 3D regionalization of each annotation type was recovered by ST-GEARS (Fig. 6d). The reconstructed result indicated the adaptability of ST-GEARS across various scales of sectioning intervals, and its applicability on both bin-level, and cell-level datasets on which histology information is incorporated.
Discussion
We introduce ST-GEARS, a 3D geospatial profile recovery approach for ST experiments. Leveraging the formulation of FGW OT, ST-GEARS utilizes both gene expression and structural similarities to retrieve cross-sectional mappings of spots with same in vivo planar coordinates, referred to as ‘anchors’. To further enhance accuracy, it uses our innovated Distributive Constraints to enhance the accuracy. Then it rigidly aligns sections utilizing the anchors, before finally eliminating section distortions using Gaussian-denoised Elastic Fields and its Bi-sectional Application.
We validate counterpart of ST-GEARS including anchors retrieval and elastic registration, respectively on DLPFC and Drosophila larva dataset. In the validation of anchors retrieval, through Mapping accuracy evaluation of retrieved anchors, ST-GEARS consistently outperformed PASTE and PASTE2 across all section pairs. We show Distributive Constraints as reasons behind its distinguished performance, which effectively suppressed the generation of anchors between spot groups with low cross-sectional similarity while enhances their generation among groups with higher similarity. To investigate the effectiveness of the elastic registration process, we evaluate the effects of tissue area changes and cross-sectional similarity using the Drosophila larvae dataset. Both smoother tissue area curves and higher similarity observed between structurally consistent sections confirm the efficacy of the elastic process of ST-GEARS.
We demonstrate ST-GEARS’s advanced accuracy of reconstruction compared to current approaches including PASTE, PASTE2 and GPSA, and its positive impact on downstream analysis compared to existing approaches. Our evaluation encompasses diverse application cases, including registration of two adjacent sections of Mouse hippocampus tissue measured by Slide-seq, reconstruction of 16 sections of Drosophila embryo individual measured by Stereo-seq, and reconstruction of a complete Mouse brain measured by BARseq, including 40 sections with sectioning interval as far as 200 μm. Among the methods, registered result by ST-GEARS exhibited the highest intra-structural consistency measured by MSSIM for two hippocampus sections separated by a single layer of neurons. On 16 sections of a Drosophila embryo individual, our method’s outstanding accuracy is indicated by both MSSIM and smoothness of tissue area changes. Importantly, ST-GEARS provides more reliable embryo morphology, precise tissue regionalization, and accurate marker gene distribution under hybridization evidence compared to existing approaches. This suggests that ST-GEARS provides higher quality tissues, cells, and genes information. On Mouse brain sections with large intervals of 200 μm, ST-GEARS avoided positional and angular misalignments that occur in result of PASTE and PASTE2. The improvement was quantified by a higher MSSIM. Both hemisphere morphology and cortex layer regionalization were reflected in the result of 3D reconstruction by ST-GEARS. The successful representation of important structural and functional features in the aforementioned studies collectively underscores ST-GEARS’ reliability and capability for advancing 3D downstream research, enabling more comprehensive and insightful analysis of complex biological systems.
To further enhance and extend our method, opportunities in various aspects are anticipated to be explored. Firstly, algorithm aspects including hyperparameter sensitivity and scalability can be further explored for a more enhanced method performance. Though recommended values are provided for two of its hyperparameters, method performance is still affected by parameter values, raising the potential issue of overfitting and sensitivity which can be further studied. In scalability aspect, ST-GEARS introduces obvious computational cost increasement when dealing with large-scale datasets. Though strategy of Granularity adjusting is innovated to down-grade complexity, opportunity of improving robustness on increasing scale of data is expected to be further explored. Secondly, tasks aimed at improving data preprocessing, including but not limited to batch effect removal and diffusion correction, are expected to be integrated into our method, considering their coupling property with registration task itself: inaccuracies in input data introduce perturbations to anchors optimization, while recovered spatial information of our method may assist data quality enhancement by providing registered sections. Thirdly, the ST-GEARS’ Distributive Constraint takes rough grouping information as its input, which may potentially introduce computational burden during the reconstruction process. To address this, an automatic step is expected to be developed to reliably cluster spots while maintaining computational efficiency of the overall process. This step can be integrated into our method either as preprocessing, or as a coupling task, similarly to our expectation of data quality enhancement. Finally, we envision incorporating a wider scope of anchors applications into our existing framework. such as information integration of sections across time, across modalities and even across species. With interpretability, robustness and accuracy provided by ST-GEARS, we anticipate its applications and extension in various areas of biological and medical research. We believe that our method can help address a multitude of questions regarding growth and development, disease mechanisms, and evolutionary processes.
Methods
FGW OT description
Fused Gromov Wasserstein (FGW) Optimal Transport (OT) is the modeling of spot-wise or cell-wise similarity between two sections, with the purpose of solving optimal mappings between the spots or cells, with mappings also called ‘anchors’. By FGW OT, the optimal group of mappings enables highest gene expression similarity between mapped spots, at the same time keeping similar positions relative to their located sections.
The required input of FGW OT includes genes expression, spot or cell locations before registration, and constraint values which assigns different weight to the optimization on different spots or cells. For gene expression, we introduce for section A, to describe normalized count of unique molecular identifiers (UMIs) of different genes of each cell or spot, thereinto nA denotes number of spots in slice A, and m denotes number of genes that are captured in both sections. Similarly, we describe gene expression on section B as , with genes arranged in the same order as in A. For spot or cell locations, we introduce to describe spots locations of section A, with the 1st column storing horizontal coordinates and the 2nd storing vertical coordinates. Similarly, we have to describe spots locations in section B. Spots are arranged in the same order in gene expression and location matrices. Constraint values are discussed in section of Distributive Constraints.
FGW OT solves:
1 |
Thereinto, describes the similarity of each pair of spots respectively on section A and B, formulated as . Be noted that still indicates spot-wise similarity MAB, with section code AB being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i, j. KL denotes Kullback-Leibler (KL) divergence43. describes spot-wise distance within section A, with , and dis denoting Euclidean distance measure. Be noted that and still indicate spot locations XA, with section code A being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i and j. refers to spot-wise distance CA for the same reason. Similarly, describes spot-wise distance of section B. defines the difference between all spot pair distance respectively on section A and B, with . ⊗ denotes Kronecker product of two matrices; 〈,〉 denotes matrix multiplication.
Adjacency matrix to be optimized stores strength of anchors between spots from the two sections, with row index representing spots on section A, and column index representing spots on section B. Sum of elements of π is 1. With , the similarity of mapped spots are measured. With , similarity between distance of spot pairs on section A, with its anchored spot pairs on section B, is measured. describes similarity between spatial structures under the anchors’ connection. α ∈ [0,1] denotes regularization factor, which specifies the relative importance of structure similarity compared to expression similarity. WA and WB are constraint values that are introduced in section of Distributive Constraints.
With the formulation above, FGW OT solves optimal anchors between the spots, or cells, which enables maximum weighted combination of gene expression similarity and position similarity of mapped spots or cells.
Distributive constraints
As adopted by constraint values in FGW OT, we introduce Distributive Constraints, to assign different emphasis to spots or cells in the optimization. Distributive Constraints utilizes cell type component information to differentiate the emphasis: if an annotation or cluster express high similarity across sections, its corresponding spots or cells will be placed relatively high sum of probability, and vice versa. With higher sum of probability, more anchors and anchors with higher strength are generated, while less anchors are produced on spots with lower sum of probability. This operation leads registration to rely more on expression-consistent regions of sections, hence largely enhancing both accuracy of anchors and precision of following rigid and elastic registration.
The required inputs of Distributive Constraints include and , which store the grouping information such as annotation type or cluster of each spot in section A and B. We then summarize the repeated annotations or clusters from GA and GB, and put the unique values in . ngroup is the number of unique annotation type or clusters. Then implemented in ST-GEARS, for each annotation type or cluster gi, we calculate the average gene expression across spots:
where
Be noted that and still indicate grouping information GA and GB, with section code A and B being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i′ and j′. And and are both row vectors of ones.
With average gene expression of each annotation type or cluster, with the form of distribution, we measure its difference across sections by KL divergence. Then the calculated distance is mapped by logistic kernel, to further emphasize differences between relatively consistent annotations or clusters.
, where . Putting scaler value dis of each annotation or cluster together, we have a vector . Finally, we transform the distance to similarity, map the similarity result back to each spot:
We further apply normalization on the result:
WA and WB are constraints values applied in (1). Since the values are computed based on similarity measure using cell composition information, weight of FGW OT is automatically redistributed, with higher emphasis on more consistent regions across sections, and less emphasis on less consistent area. Enhanced anchor accuracy hence registration accuracy is then achieved.
Self-adaptive regularization
In FGW OT formulation, a regularization factor is included to specify the relative importance of structural similarity compared to expression similarity during optimization. ST-GEARS includes a self-adaptive regularization method that determines the factor value, that induces highest overall accuracy of anchors despite of varying situations. Situations include but are not limited to section distances, spot sizes, extent of distortions, and data quality such as level of diffusion.
By practice, our method respectively adopts factors on multiple scales including 0.8, 0.4, 0.2, 0.1, 0.05, 0.025, 0.013, and 0.006. The candidate values vary exponentially, for ST-GEARS to find the optimal term regardless of scale differences between expression and structural term in (1). The accuracy of each set of optimized anchors by every regularization factor was evaluated, by measuring weighted percentage of anchors that join spots with same annotation types or clusters. Be noted that and still indicate grouping information GA and GB, respectively, with section code A and B being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index i and j. The regularization factor value that achieves highest accuracy is then adopted by our method.
Elastic field inference
Finding spots with highest probability
After rigid registration, elastic fields are inferred based on the anchors with the highest probability for each spot or cell. For elastic field to be applied on each section, it is calculated using its anchors with closest sections, as well as spatial coordinates of sections after rigid registration. Along cross-sectioning order, each section in the middle has two closest sections, respectively on its anterior and posterior sides. Exceptionally, if a section is on anterior or posterior end, it has only one closest section.
Specifically for a section in the middle with N spots, we calculate and which stores the mapped spots on anterior and posterior neighbor section for each of its spots. The calculation takes as input adjacency matrix πpre, which stores anchors with the anterior neighbor section output by FGW OT, and πnext storing anchors with posterior section.
Be noted that and still indicate adjacency matrix πpre and πnext, with direction code pre and next being moved to superscript and added parenthesis for clarity, since subscript location are taken by spot index n.
Notably, not every spot in a selected section has its own anchored spot, due to multiple strategies including distributive constraint and anchors filtration, hence their corresponding element in Ipre and Inext are null. For section located on posterior end, only Inext is applicable; and for section located on anterior end, only is applicable.
Elastic field establishment
After specifying spots with highest probability, ST-GEARS calculates location displacements between the spots, then establishes elastic fields for each section. An elastic field is a 2D displacement distribution, describing how displacement values are distributed across different locations. And it is established to enable ST-GEARS to benefit from further denoising functions to reduce elastic operation outliers and improve elastic effect consistency across regions.
For each section located in the middle, 4 elastic fields are generated. Two of those represent the section’s horizontal and vertical displacement distribution compared to anterior neighbor section, denoted as 2D matrix F(x_pre) and F(y_pre), while the other two represent its horizontal and vertical displacement distribution compared to posterior neighbor, denoted as F(x_next) and F(y_next). To initialize F(x_pre), F(y_pre), F(x_next) and F(y_next) for the section, the shape of the matrix is first decided. Its height denoted by Height and width denoted by Width are calculated by gridding the spot locations using a fixed step. Height and Width are shared across the 4 matrices:
For its input, denotes spots location of current section after rigid registration. For a single section, we prepare and as spots location of its anterior and posterior section after rigid alignment, respectively. psize represents average distance between closest spot or cell centers, and it is to be input by users. The matrix has no filled values to this step.
To fill in the fields, we first transform spot locations into the coordinate system of field. With and :
We then calculate location displacements between each of its spots and their anchored spots with highest probability, on both anterior and posterior neighbors. With and :
With the spot locations in field coordinates and the displacement values above, we fill in corresponding elements of the elastic field:
2 |
By the end of Eqs. (2), 4 elastic fields for each section in the middle is established. However, some elements in the matrix are still empty, because of absence of spots or cells located in the grid of location. To address this problem, 2d nearest interpolation method44 was adopted, which fills in every empty element, with the displacement value of its neighboring elements:
thereinto denotes grid coordinates of the designed field, with . And finterp_grid denotes the nearest interpolation method.
For section located on posterior end, only F(x_next) and F(y_next) are applicable; and for section located on anterior end, only F(x_pre) and F(y_pre) are applicable.
2D Gaussian denoising
As caused by exerted force, the displacement or elastic field is expected to have static or smoothly changing values across different locations45–47. ST-GEARS makes use of this property, to smoothen the field and to reduce errors in the field caused by any upper stream process, such as raw data noises and inaccuracy in anchor computation. Gaussian filtering48,49 is adopted to implement the denoising, similarly to image denoising processes50,51. Denoised elastic fields are then generated.
It calculates weighted average across the neighboring region of each element to replace its value:
where fgaussian_filter denotes the method of Gaussian filtering.
Bi-sectional fields application
Bi-sectional fields application plan
With elastic fields generated and denoised, ST-GEARS uses the fields as a guidance to correct distortion for each section. Through querying the elastic fields with spatial location of each spot, the displacement to be implemented is returned. For a section in the middle, its elastic fields calculated with both anterior and posterior neighbor sections are queried, and guidance provided by both anterior and posterior sections are applied on the rigid aligned result, called ‘Bi-sectional Fields Application’. After the application, the distortion of the section is corrected, and the elastic registration result is generated.
Specifically, the denoised elastic fields are first queried, returning the displacement to be implemented:
Next, average displacement returned by both anterior and posterior sections are applied on the rigid registration result, leading to final elastic registration result
For section located on posterior end,
For section located on anterior end,
The validity of this plan is proved in the section: Proof of validity of Bi-sectional Fields Application.
Proof of validity of Bi-sectional fields application
Bi-sectional Fields Application accurately recovers the spatial profile before distortion, by averaging and applying displacement value guided by both anterior and posterior neighbor section. The effect is approved mathematically as following:
Take section A, B, and C as an example of a sequence of sections, with XA, XB and XC denoting their spots’ spatial information after rigid alignment, and XA_insitu, XB_insitu and XC_insitu denoting their in vivo spatial information. The distortion occurred to the slices during experiments are denoted as XA_dis, XB_dis and XC_dis.
According to Bi-sectional Fields Application, the corrected spatial information is:
Thereinto,
Hence,
3 |
Based on the in vivo morphological consistency across sections, spatial information of section B can be approximated by an average of information of A and C, written as
4 |
Given that XA_dis and XC_dis can be seen as independent and identically distributed sets of variables,
5 |
where μABC is the universal mean, and ΣABC is the variance of the 2d displacement information.
Inserting the terms (4) and (5) back to Eq. (3) gives
indicating the proximity of corrected spatial information to in vivo spatial information.
Evaluation metrix
We evaluated the accuracy of anchors by index of Mapping Accuracy, and measured the reconstruction effect by MSSIM and SI-STD-DI, in both elastic effect study and overall methodology comparison.
Mapping accuracy
Designed and adopted by PASTE27, Mapping Accuracy calculates the weighted percentage of anchors joining spots with same annotation.
MSSIM index
MSSIM measures the accuracy of registration, based on the assumption that in some sectioning positions, tissue morphology remains almost consistent across slices. The method quantifies the accuracy, by measuring the similarity of annotation type distribution of such section pairs.
To implement the quantification, first, structurally consistent section pairs are selected among all sections arranged in sequence.
Next, on each section from the pair, transformation from individual spots to a complete image is implemented, by gridding the rectangular area that surrounds the tissue, and assigning each grid of a value that represents the annotation type which occurs most frequently in the grid. The resulted image describes the annotation type distribution of the section.
Finally, similarity between each pair of images is measured, by index of MSSIM52. The method generates a window with fixed size, slides the window simultaneously on both images, and compares the two framed parts by windows on their intensity, contrast, and structures. Among those, the intensity difference is measured by difference of average pixel values, the contrast difference is measured by comparing variance of the two sets of framed pixel values, and the structure difference is measured by comparing their covariances. A Structural Similarity of Images (SSIM) index is calculated for each position of the window using , where μx and μy denote average pixel values of the frames, σx and σy denote variances of the frames, and σxy denotes covariances of the two frames. c1 and c2 are constants to avoid 0 value of the divisor. Averaging the SSIM value across all windows gives the final MSSIM result of the two sections.
SI-STD-DI
SI-STD-DI measures smoothness of area changing across sections along a fixed axis, by calculating the standard deviation of area changes on each pair of adjacent sections and scale the result by dividing it by average area.
Software and code
Data analysis
All software used to analyze data in this study are open-sourced Python packages, including anndata = 0.9.2, numpy = 1.22.4, pandas = 1.4.3, scipy = 1.10.1, matplotlib = 3.5.2, k3d = 2.15.3.
Statistics and reproducibility
No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The Investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Source data
Acknowledgements
This work is part of the “SpatioTemporal Omics Consortium” (STOC) paper package. A list of STOC members is available at: http://sto-consortium.org. We acknowledge the Stomics Cloud platform (https://cloud.stomics.tech/) for providing convenient ways for analyzing spatial omics datasets. We acknowledge the CNGB Nucleotide Sequence Archive (CNSA) of China National GeneBank DataBase (CNGBdb) for maintaining the Drosophila database. This work is supported by National Natural Science Foundation of China (32300526 to S. F., 32100514 to M. X.). We thank Weizhen Xue for the inspirational discussion towards design of Distributive Constraints. We thank Yating Ren for her advice towards a more efficient code implementation. We thank Dr. Xiaojie Qiu and Dr. Yinqi Bai for the discussion on the registration topic and their advice on our work.
Author contributions
Tianyi Xia was responsible of method design, analysis design and implementation, as well as drafting of this manuscript. Dr. Luni Hu participated in structure design of the applications. Lulu Zuo was in part of 3D visualizations design, and she helps maintain our online repository. Tianyi Xia, Lei Cao, Lulu Zuo and Dr. Luni Hu conducted experiments and analysis for reply to peer review. Dr. Yunjia Zhang provided insights in anchors results interpretation of DLPFC dataset, and in accuracy analysis of mouse brain dataset. Dr. Mengyang Xu revised this article. Lei Zhang and Bowen Ma offered numerous suggestions to enhance computational efficiency, in both memory and time. Taotao Pan and Chuan Chen provided suggestions in data preprocessing. Qin Lu, Bohan Zhang, Junfu Guo, Chang Shi and Mei Li provided suggestions for this study. Dr. Shuangsang Fang supervised this study in structure and analysis design, and she revised this article. Chao Liu, Yuxiang Li and Yong Zhang supervised this study.
Peer review
Peer review information
Nature Communications thanks Jun Ding, Xiangyu Luo and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Data availability
All data used in this research were collected from published sources. DLPFC data was obtained from the research: Transcriptome-scale Spatial Gene Expression in the Human Dorsolateral Prefrontal Cortex, with data downloading link of http://research.libd.org/spatialLIBD/index.html; Drosophila embryo and Drosophila larva data were collected from High-resolution 3d Spatiotemporal Transcriptomic Maps of Developing Drosophila Embryos and Larvae, with the dataset link of https://db.cngb.org/stomics/datasets/STDS0000060. Mouse brain data was collected from research: Modular cell type organization of cortical areas revealed by in vivo sequencing. The download link is: https://data.mendeley.com/datasets/8bhhk7c5n9/1. All datasets were generated on Spatial Transcriptomics platform, with DLPFC data generated by Visium technology of 10x Genomics, Mouse brain data generated by BARseq of Cold Spring Harbor Laboratory, while Drosophila embryo and larva generated by Stereo-seq technology of BGI. Source data are provided with this paper.
Code availability
The methods of ST-GEARS is packaged, and distributed as an open-source, publicly available repository at https://github.com/STOmics/ST-GEARS53.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Chao Liu, Email: liuchao3@genomics.cn.
Yuxiang Li, Email: liyuxiang@genomics.cn.
Yong Zhang, Email: zhangyong2@genomics.cn.
Shuangsang Fang, Email: fangshuangsang@genomics.cn.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-51935-0.
References
- 1.Marx, V. Method of the year: spatially resolved transcriptomics. Nat. Methods18, 9–14 (2021). 10.1038/s41592-020-01033-y [DOI] [PubMed] [Google Scholar]
- 2.Yue, L. et al. A guidebook of spatial transcriptomic technologies, data resources and analysis approaches. Comput. Struct. Biotechnol. J. 21, 940–955 (2023) [DOI] [PMC free article] [PubMed]
- 3.Park, H.-E. et al. Spatial transcriptomics: technical aspects of recent developments and their applications in neuroscience and cancer research. Adv. Sci.10, 2206939 (2023). 10.1002/advs.202206939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gyllborg, D. et al. Hybridization-based in vivo sequencing (hybiss) for spatially resolved transcriptomics in human and mouse brain tissue. Nucleic acids Res.48, 112–112 (2020). 10.1093/nar/gkaa792 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen, X. et al. High-throughput mapping of long-range neuronal projection using in vivo sequencing. Cell179, 772–786 (2019). 10.1016/j.cell.2019.09.023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science361, 5691 (2018). 10.1126/science.aat5691 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Qin, D. Next-generation sequencing and its clinical application. Cancer Biol. Med.16, 4 (2019). 10.20892/j.issn.2095-3941.2018.0055 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chen, A. et al. Large field of view-spatially resolved transcriptomics at nanoscale resolution. BioRxiv10.1101/2021.01.17.427004 (2021).
- 9.Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with slide-seqv2. Nat. Biotechnol.39, 313–319 (2021). 10.1038/s41587-020-0739-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Moses, L. & Pachter, L. Museum of spatial transcriptomics. Nat. Methods19, 534–546 (2022). 10.1038/s41592-022-01409-2 [DOI] [PubMed] [Google Scholar]
- 11.Moor, A. E. & Itzkovitz, S. Spatial transcriptomics: paving the way for tissue-level systems biology. Curr. Opin. Biotechnol.46, 126–133 (2017). 10.1016/j.copbio.2017.02.004 [DOI] [PubMed] [Google Scholar]
- 12.Zhou, R., Yang, G., Zhang, Y. & Wang, Y. Spatial transcriptomics in development and disease. Mol. Biomed.4, 32 (2023). 10.1186/s43556-023-00144-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li, Z. & Peng, G. Spatial transcriptomics: New dimension of understanding biological complexity. Biophys. Rep.8, 119 (2022). 10.52601/bpr.2021.210037 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Williams, C. G., Lee, H. J., Asatsuma, T., Vento-Tormo, R. & Haque, A. An introduction to spatial transcriptomics for biomedical research. Genome Med.14, 1–18 (2022). 10.1186/s13073-022-01075-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Walker, B. L., Cang, Z., Ren, H., Bourgain-Chang, E. & Nie, Q. Deciphering tissue structure and function using spatial transcriptomics. Commun. Biol.5, 220 (2022). 10.1038/s42003-022-03175-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Atta, L. & Fan, J. Computational challenges and opportunities in spatially resolved transcriptomic data analysis. Nat. Commun.12, 5283 (2021). 10.1038/s41467-021-25557-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Velten, B. et al. Identifying temporal and spatial patterns of variation from multimodal data using Mefisto. Nat. Methods19, 179–186 (2022). 10.1038/s41592-021-01343-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Townes, F. W. & Engelhardt, B. E. Nonnegative spatial factorization applied to spatial genomics. Nat. Methods20, 229–238 (2023). 10.1038/s41592-022-01687-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Verma, A. & Engelhardt, B. A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments. bioRxiv10.1101/2020.01.14.906313 (2020).
- 20.Svensson, V., Teichmann, S. A. & Stegle, O. Spatialde: identification of spatially variable genes. Nat. Methods15, 343–346 (2018). 10.1038/nmeth.4636 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Dries, R. et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol.22, 1–31 (2021). 10.1186/s13059-021-02286-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang, M. et al. High-resolution 3d spatiotemporal transcriptomic maps of developing drosophila embryos and larvae. Dev. Cell57, 1271–1283 (2022). 10.1016/j.devcel.2022.04.006 [DOI] [PubMed] [Google Scholar]
- 23.Mohenska, M. et al. 3d-cardiomics: a spatial transcriptional atlas of the mammalian heart. J. Mol. Cell. Cardiol.163, 20–32 (2022). 10.1016/j.yjmcc.2021.09.011 [DOI] [PubMed] [Google Scholar]
- 24.Vickovic, S. et al. Three-dimensional spatial transcriptomics uncovers cell type localizations in the human rheumatoid arthritis synovium. Commun. Biol.5, 129 (2022). 10.1038/s42003-022-03050-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature596, 211–220 (2021). 10.1038/s41586-021-03634-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bergenstråhle, J., Larsson, L. & Lundeberg, J. Seamless integration of image and molecular analysis for spatial transcriptomics workflows. BMC Genom.21, 1–7 (2020). 10.1186/s12864-020-06832-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Zeira, R., Land, M., Strzalkowski, A. & Raphael, B. J. Alignment and integration of spatial transcriptomics data. Nat. Methods19, 567–575 (2022). 10.1038/s41592-022-01459-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Liu, X., Zeira, R. & Raphael, B. Paste2: Partial alignment of multi-slice spatially resolved transcriptomics data. In Research in Computational Molecular Biology: 27th Annual International Conference, 210 (Springer Nature, 2023)
- 29.Jones, A., Townes, F. W., Li, D. & Engelhardt, B. E. Alignment of spatial genomics data using deep gaussian processes. Nat. Methods20, 1379–1387 (2023). 10.1038/s41592-023-01972-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Xia, C.-R., Cao, Z.-J., Tu, X.-M. & Gao, G. Spatial-linked alignment tool (slat) for aligning heterogenous slices properly. bioRxiv10.1101/2023.04.07.535976 (2023). [DOI] [PMC free article] [PubMed]
- 31.Qiu, X., et al. Spateo: multidimensional spatiotemporal modeling of single-cell spatial transcriptomics. BioRxiv10.1101/2022.12.07.519417 (2022).
- 32.Guo, L. et al. Vt3d: a visualization toolbox for 3d transcriptomic data. J. Genetics Genom. 50, 713–719 (2023). [DOI] [PubMed]
- 33.Fang, S. et al. Stereopy: modeling comparative and spatiotemporal cellular heterogeneity via multi-sample spatial transcriptomics. bioRxiv10.1101/2023.12.04.569485 (2023).
- 34.Titouan, V., Courty, N., Tavenard, R. & Flamary, R. Optimal transport for structured data with application on graphs. Int. Conf. Mach. Learn.91, 6275–6284 (2019).
- 35.Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.24, 425–436 (2021). 10.1038/s41593-020-00787-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science363, 1463–1467 (2019). 10.1126/science.aaw1219 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Chen, X., Fischer, S., Zhang, A., Gillis, J. & Zador, A. Modular cell type organization of cortical areas revealed by in vivo sequencing. BioRxiv10.1101/2022.11.06.515380 (2022).
- 38.Abdolhosseini, F. et al. Cell identity codes: understanding cell identity from gene expression profiles using deep neural networks. Sci. Rep.9, 2342 (2019). 10.1038/s41598-019-38798-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Efroni, I., Ip, P.-L., Nawy, T., Mello, A. & Birnbaum, K. D. Quantification of cell identity from single-cell gene expression profiles. Genome Biol.16, 1–12 (2015). 10.1186/s13059-015-0580-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lacoste-Julien, S. Convergence rate of frank-wolfe for non-convex objectives. arXiv10.48550/arXiv.1607.00345 (2016).
- 41.Wahba, G. A least squares estimate of satellite attitude. SIAM Rev.7, 409–409 (1965). 10.1137/1007077 [DOI] [Google Scholar]
- 42.Lanjakornsiripan, D. et al. Layer-specific morphological and molecular differences in neocortical astrocytes and their dependence on neuronal layers. Nat. Commun.9, 1623 (2018). 10.1038/s41467-018-03940-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Csisz ´ar, I: I-divergence geometry of probability distributions and minimization problems. Ann. Probab. 3, 146–158 (1975).
- 44.Schoenberg, I. J. Contributions to the problem of approximation of equidistant data by analytic functions. In I. J. Schoenberg Selected Papers.Contemporary Mathematicians. (ed. de Boor, C.) 3–57 (Birkhäuser, Boston, 1988).
- 45.Zhou, H. & Jayender, J. Smooth deformation field-based mismatch removal in real-time. arXiv10.1101/7.08553 (2020).
- 46.Li, X. & Hu, Z. Rejecting mismatches by correspondence function. Int. J. Comput. Vis.89, 1–17 (2010). 10.1007/s11263-010-0318-x [DOI] [Google Scholar]
- 47.Li, X., Larson, M. & Hanjalic, A. Pairwise geometric matching for large-scale object retrieval. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, 5153–5161 (IEEE, 2015)
- 48.Bergholm, F. Edge focusing. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 726–741 (IEEE, 1987). [DOI] [PubMed]
- 49.Marr, D. & Hildreth, E. Theory of edge detection. Proc. R. Soc. Lond. Ser. B. Biol. Sci.207, 187–217 (1980). [DOI] [PubMed] [Google Scholar]
- 50.Mafi, M. et al. A comprehensive survey on impulse and gaussian denoising filters for digital images. Signal Process.157, 236–260 (2019). 10.1016/j.sigpro.2018.12.006 [DOI] [Google Scholar]
- 51.Saxena, C. & Kourav, D. Noises and image denoising techniques: a brief survey. Int. J. Emerg. Technol. Adv. Eng.4, 14878–14885 (2014). [Google Scholar]
- 52.Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. image Process.13, 600–612 (2004). 10.1109/TIP.2003.819861 [DOI] [PubMed] [Google Scholar]
- 53.Xia, T. et al. ST-GEARS: Advancing 3d downstream research through accurate spatial information recovery. GitHub. 10.5281/zenodo.13131713 (2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data used in this research were collected from published sources. DLPFC data was obtained from the research: Transcriptome-scale Spatial Gene Expression in the Human Dorsolateral Prefrontal Cortex, with data downloading link of http://research.libd.org/spatialLIBD/index.html; Drosophila embryo and Drosophila larva data were collected from High-resolution 3d Spatiotemporal Transcriptomic Maps of Developing Drosophila Embryos and Larvae, with the dataset link of https://db.cngb.org/stomics/datasets/STDS0000060. Mouse brain data was collected from research: Modular cell type organization of cortical areas revealed by in vivo sequencing. The download link is: https://data.mendeley.com/datasets/8bhhk7c5n9/1. All datasets were generated on Spatial Transcriptomics platform, with DLPFC data generated by Visium technology of 10x Genomics, Mouse brain data generated by BARseq of Cold Spring Harbor Laboratory, while Drosophila embryo and larva generated by Stereo-seq technology of BGI. Source data are provided with this paper.
The methods of ST-GEARS is packaged, and distributed as an open-source, publicly available repository at https://github.com/STOmics/ST-GEARS53.