Abstract
Background
Elucidating the spatiotemporal dynamics of gene expression is essential for understanding complex physiological and pathological processes. Current spatial transcriptomics techniques are hindered by low read depths and limited gene detection.
Results
Here, we introduce Palette, a pipeline that infers detailed spatial gene expression patterns from bulk RNA-seq data, utilizing existing spatial transcriptomics data as the sole reference. This method identifies more precise expression patterns by smoothing, imputing and adjusting gene expressions. We apply Palette to reconstruct the zebrafish SpatioTemporal Expression Profiles (zSTEP) by integrating 53-slice serial bulk RNA-seq data from three developmental stages with existing spatial transcriptomics and image references. zSTEP provides a comprehensive cartographic resource for examining gene expression and investigating developmental events within zebrafish embryos. Utilizing machine learning-based screening, we identify key morphogens and transcription factors essential for anteroposterior axis development and characterized their dynamic distribution throughout embryogenesis. In addition, among these transcription factors, Hox family genes are found to be pivotal in anteroposterior axis refinement. Their expression is closely correlated with cellular anteroposterior identities, and hoxb genes may act as central regulators in this process.
Conclusions
This study presents Palette, a pipeline for integrating bulk RNA-seq data and spatial transcriptomics data, and zSTEP, a comprehensive cartographic resource for investigating zebrafish early embryonic development. In addition, key morphogens and transcriptional factors essential for anteroposterior axis establishment and refinement are identified.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-025-03917-8.
Keywords: Spatial transcriptomics, Spatiotemporal gene expression, Spatial deconvolution, Zebrafish, Primary body axis
Background
Model organisms such as zebrafish have long been valuable tools for studying developmental biology and human diseases. Understanding the spatiotemporal patterns of gene expression in these models is crucial for gaining insights into the physiological and pathological mechanisms in normal development and related diseases. Thus, great efforts are ongoing to construct gene expression maps of these models with higher resolution, depth, and comprehensiveness [1–6].
Traditional technologies, such as in situ hybridization (ISH) and immunostaining, have been widely used for investigating the spatiotemporal expression patterns of specific genes. However, these approaches are limited in their ability to simultaneously detect the expression of a large number of genes. In recent years, significant progress has been made in developing technologies for obtaining transcriptomics with spatial information. Techniques such as laser capture microdissection/microscopy (LCM) combined with bulk RNA-seq [7, 8], Tomo-seq [1], and Geographical positional sequencing (Geo-seq) [9] have allowed the generation of spatially resolved transcriptomic data [10]. Additionally, methods like seqFISH [11], MERFISH [12, 13], Slide-seq [14], 10 × Visium [15, 16], and Stereo-seq [2–4] have further improved the spatial resolution.
While these spatial transcriptomics (ST) techniques have advanced the spatial resolution of transcriptomic data, bulk RNA-seq remains the preferred choice for most studies due to limitations associated with ST techniques such as low read depth, suboptimal gene detection capability, and high cost [10, 17]. Consequently, tools have been developed to infer cell type proportions from bulk RNA-seq data, including TIMER [18], MuSiC [19], and DWLS [20]. More recently, methods such as Bulk2Space [21], SPACEL [22] and STGAT [23] have been designed to integrate bulk and scRNA-seq data with ST data to infer spatial gene expression and enhance spatial resolution, focusing on expression at the spot level. However, obtaining paired bulk, ST and scRNA-seq data is often challenging. Therefore, a tool that can directly integrate bulk and ST data to infer spatial gene expression patterns closely associated with biological functions would be highly desirable.
In this study, we introduce Palette, a pipeline designed to allocate gene expression from bulk RNA-seq data to spatial spots using ST data as the only reference. Palette has demonstrated its effectiveness in inferring spatial expression patterns in both Drosophila and zebrafish sections. We performed bulk RNA-seq on serial cryosections of zebrafish embryos along the left–right axis at three developmental stages. By applying Palette to the obtained data with the Stereo-seq data [2] as references, we inferred the spatial gene expression patterns. We then projected the constructed 3D ST maps onto the zebrafish embryo images with 3D coordinates [24] to correct the deformation during cryosectioning and construct a 3D spatial gene expression cartograph that more accurately reflects embryonic morphology. We named this cartograph the zebrafish SpatioTemporal Expression Profiles (zSTEP), which enables the visualization of gene expression patterns in the context of the 3D morphology of the zebrafish embryos. Finally, leveraging the capabilities of zSTEP, we characterized potential roles of morphogens and transcription factors (TFs) in anteroposterior (AP) refinement during the progression of the primary body axis.
Results
Design concept of Palette
The overall working pipeline of Palette is depicted in Fig. 1, illustrating the key steps involved in our approach. The pipeline firstly incorporates spatial clustering and deconvolution processes to account for differences in cluster abundances between bulk RNA-seq and ST data. Then, a variable factor is introduced to adjust expression differences between the two types of data. Subsequently, the pipeline estimates gene expression in each spot using a loop algorithm that takes into account regional gene expression, spot characteristics, and spot-spot distances. This iterative process allows for the inference of spatially resolved gene expression from bulk transcriptome data with relatively stable gene expression. The pipeline outlined in Fig. 1 represents the sequential steps employed in Palette to accurately allocate gene expression to spatial spots using the information provided by the bulk RNA-seq data.
Fig. 1.
Working pipeline of Palette. a Defining spot clusters in ST reference and estimating cluster abundances in bulk transcriptome data. Bulk transcriptome data and ST reference are taken as input. Highly expressed genes in both datasets are used for spatial clustering through BayesSpace [25]. The cluster expression matrix obtained from spatial clustering is then used as the reference for performing deconvolution on bulk transcriptome data, resulting in the estimated cluster abundances of bulk transcriptome data. b Adjusting cluster expression matrix by employing the variable factor. The variable factor represents the expression differences between bulk RNA-seq data and ST reference. c Estimating the expression in each spot through a loop algorithm. The expression of LST is adjusted and then evenly allocated to each spot of this cluster. After the looping steps, the average expression of each spot is taken as the estimated expression
The detailed procedures of Palette can be divided into the following three steps. First, spot clusters are defined in the ST data and the proportion of each defined cluster is inferred in bulk RNA-seq data (Fig. 1a). Specifically, highly expressed genes in both ST and bulk RNA-seq data are used for spatial clustering of the ST data through BayesSpace [25], and then MuSiC [19] is employed for deconvolution to estimate cluster abundances in bulk data. This step can effectively eliminate the batch effect caused by technical differences in sampling, mRNA capture, platform, etc. between the two experiments. Second, a variable factor is introduced to adjust the cluster expression matrix (Fig. 1b). To obtain the variation of each gene in ST and bulk data, a pseudo bulk vector is achieved as the cross product of the cluster expression matrix of ST data and the cluster proportions of bulk data, so that the variable factor vector can be calculated by the ratio of the input bulk to the pseudo bulk vector. Consequently, the stable genes and variable genes can be distinguished by the distribution of the variable factor vector. The adjusted matrix is obtained by taking the dot product of the cluster expression matrix of ST slice and the variable factor. This step can effectively overcome the common sparsity problem in ST technologies, and the adjusted matrix not only contains the cluster composition information but also fully retains the accuracy of the bulk transcriptome in the detection of lowly expressed genes. Third, the expression of each spot is estimated through an iteration algorithm (Fig. 1c). In each iteration, the procedure begins by selecting one random spot (i) and its nearest neighbouring spots (Local i). The expression of spots belonging to the same cluster is aggregated to form a pseudo-cluster expression data called local ST (LST). Taking the local expression into account, a regional factor is defined based on the ratio between the LST to the reference ST data. The regional factor is subsequently applied to the adjustment matrix to compute the adjusted LST expression, which is evenly allocated into the selected spots of this cluster. The loop then proceeds to the next iteration (iteration j), and after multiple iterations, typically thousands of times, the average expression of each spot is almost stable, which is considered as the output estimated expression.
The expression patterns on Palette reconstructed ST show enhanced spatial specificity and continuity (Fig. 1c). Our algorithm incorporates spot characteristics and spot-spot distances, emphasizing cluster-specific expression, while leveraging expression from bulk data to adjust gene expression in the ST spots. Additionally, the assumption that the neighbouring spots of the same cluster share similar gene expression enables imputation based on gene expression in neighbouring spots. This strategy partially mitigates the limitation of low detected gene numbers in each spot. Overall, the Palette pipeline serves as a valuable tool for inferring spatial gene expression patterns from bulk RNA-seq data, striving to generate accurate predictions of spatial gene expression that closely resemble the expression patterns in bona fide tissues.
Palette enables the prediction of gene expression patterns with higher spatial specificity and accuracy
To assessed the performance of Palette, we first utilized two consecutive slices (referred to slices S04 and S05) from the Stereo-seq data [4] of Drosophila E14-16 (14–16 h post egg laying) serial sections (Fig. 2a). We converted slice S05 into a pseudo bulk and used it as Palette's input, with slice S04 serving as the ST reference (Fig. 2a). The resulting Palette-inferred spatial gene expression of the S05 pseudo-bulk data was then designated as Palette S04. We observed that Palette S04 did not result in considerable changes in the molecule numbers of each spot, as the gene expression levels of slice S04 and slice S05 were similar. However, there was a significant increase in the feature numbers (gene numbers) of each spot (Fig. 2b). This increase was attributed to the supplementation by Palette, which leveraged the gene expression of neighbouring spots belonging to the same cluster.
Fig. 2.
The implementation of Palette resulting in more specific gene expression patterns. a Schematic diagram illustrating the procedure used to assess the performance of Palette, using two adjacent slices from the Stereo-seq data [4] of E14-16 Drosophila embryo. Slice S05 was converted into a pseudo bulk, and Palette was then applied to infer the spatial gene expression, with slice S04 serving as the ST reference. The expression patterns of Palette S04 were compared to the original expression patterns of slice S05 to evaluate the performance of Palette. The orange and purple layers outside the ST data represent the spatial data from slices S05 and S04, respectively. b Boxplots showing the numbers of molecules and genes in each spot before and after implementing Palette. The substantial increase in gene number is due to the supplementation from neighbouring spots, based on the assumption that neighbouring spots within the same cluster exhibit similar gene expression patterns. c Heatmap showing the expression correlation of marker genes for each cluster before and after implementing Palette. The colour bar represents the Pearson correlation coefficient with positive correlation in red and negative correlation in blue. d Spatial expression patterns of marker genes on the Drosophila Stereo-seq slices. Intensity of colour represents the expression levels of each marker gene. For each gene, the spatial patterns from slice S05 and Palette S04 are shown on the left, and the ISH images from BDGP database are shown on the right. The intensities of signals along the AP axis, which is represented by the black dashed lines in the images, are shown below. A, anterior; P, posterior; ARI, Adjusted Rand Index; RSME, Root Mean Square Error. e The clustering and annotation of the selected slice from the Stereo-seq data [2] of 5.25 hpf zebrafish embryo. f Circle plot showing the expression correlation network between the serial bulk data of 6 hpf zebrafish embryo and the pseudo bulk of the Stereo-seq slice. Stroke weight indicates the strength of the Pearson correlation coefficient. g Palette inferring spatial expression patterns of 6 hpf zebrafish embryo bulk data on the 5.25 hpf zebrafish Stereo-seq slice. Since zebrafish embryos at 5.25 hpf and 6 hpf exhibited similar expression patterns, we used Palette to infer spatial gene expression from the 6 hpf zebrafish embryo bulk data using the 5.25 hpf ST data as a reference. Intensity of colour represents the gene expression levels. For each gene, the spatial patterns from the Stereo-seq S10 slice and the Palette-implemented S10 slice are shown on the left, and the correlated ISH images shown on the right are from ZFIN and published data [6, 26]
Furthermore, we observed strong correlations in the expression of top marker genes between the same annotated clusters of Palette S04 and slice S05 ST data (Fig. 2c), indicating that Palette successfully preserved the molecular characteristics of each spot. Notably, Palette S04 exhibited similar gene expression patterns to the slice S05 ST data, with these patterns being even more spatially specific and closely resembling the in vivo expression patterns observed through ISH (Fig. 2d, Additional file 1: Fig. S1a and b). These results suggest that Palette's ability of gene supplementation contributed to improved continuity in the expression patterns of the implemented slices. Moreover, since Palette considered the gene expression levels within each cluster, genes with highly differential expression among clusters exhibited more specific expression patterns.
To further evaluate Palette's performance, we applied it to two additional datasets of zebrafish embryos [1, 2]: we selected one middle slice from a Stereo-seq data as the ST reference (Fig. 2e), and the slice 10 from a true bulk RNA-seq dataset was selected as the corresponding input slice based on a correlation test (Fig. 2f). We then compared the expression pattern of genes on the original ST slice and the Palette-implemented slice (Fig. 2g, Additional file 1: Fig. S1c). It was evident that the Palette-implemented slice exhibited more spatially specific expression patterns, which were more similar to the patterns observed through ISH.
Overall, Palette successfully inferred spatial gene expression from the bulk data of real biological samples, generating expression patterns with improved continuity and higher spatial specificity.
Using Palette to infer spatial gene expression from bulk RNA-seq data of zebrafish serial cryosections
To generate a more precise 3D ST dataset of zebrafish embryos, we first performed serial cryosections of embryos at three developmental stages along the left–right axis and conducted high-depth bulk RNA-seq (Fig. 3a, b, Additional file 1: Fig. S2). Then, Palette was applied to create a more accurate zebrafish spatial transcriptomic atlas.
Fig. 3.
Processing the serial bulk RNA-seq data and the Stereo-seq data. a Schematic representation of the workflow for generating the serial bulk RNA-seq data of zebrafish embryos. b Cresyl violet staining of the cryosectioned slices. Each slice is 20 µm thick. The regions stained by cresyl violet correspond to cells. c, ISH images showing the expression patterns of midline genes, gsc, lft1 and tbxta, from dorsal view. d Schematic plots showing the expected expression profiles of the three midline genes along left–right axis, assuming the embryo is aligned as anticipated. The black dashed line represents the embryonic midline. e, g Expression plot showing the expression patterns of midline genes along left–right axis in the Stereo-seq data (e) and the bulk RNA-seq data (g). The slices delineated by the black dashed lines in panel (g) correspond to middle region of the embryo, and the slices within this region were extracted from the bulk RNA-seq data for slice alignment with the Stereo-seq data. f Diagram illustrating the midline of the Stereo-seq data tilted towards the right. h The adjusted Stereo-seq slices. Poor-quality and severely damaged slices were discarded. Each spot is labelled with cell type annotations. i 3D construction of the Stereo-seq slices. j Correlation heatmap between the bulk RNA-seq data and the re-segmented Stereo-seq slices. The colour bar represents the Pearson correlation coefficient. k Expression plots showing the comparison of gene expression patterns before and after slice alignment. l Schematic representation of Palette implementation after slice alignment. Scale bar: 100 µm (b), 200 µm (c)
Before implementing Palette, we aligned the ST data with the bulk RNA-seq data using three midline genes—gsc, lft1, and tbxta—as metrics of alignment accuracy (Fig. 3c). The expression peaks of these three genes are expected to be located on the same or nearby slices (Fig. 3d). However, analysis revealed that the slice cutting lines were not parallel to the embryonic midline in both the Stereo-seq data and our bulk RNA-seq data (Fig. 3e and g). The anterior midline gene gsc and the posterior midline gene tbxta appeared on distant slices, and the tilt directions differed between the Stereo-seq data and the bulk RNA-seq data, as indicated by the positional relationships of gsc and tbxta (Fig. 3e-g) along the left-to-right direction.
To align these two datasets, we first adjusted and orientated the ST slices (Fig. 3h, Additional file 1: Fig. S3). We then overlaid them sequentially at consistent intervals (Fig. 3i), creating a 3D ST dataset that could be rotated and re-segmented to facilitate alignment. The efficacy of alignment was evaluated using a correlation coefficient derived from the expression patterns of genes with known AP differentiation (see Methods). Through continuous adjustments—rotating, re-segmenting, and recalculating correlations—we identified the configuration with the highest mean correlation coefficient. This configuration was deemed optimal for aligning the re-segmented slices with those from the bulk RNA-seq data (Fig. 3j). The expression patterns of midline genes in the re-segmented Stereo-seq slices closely aligned with those in the bulk RNA-seq slices (Fig. 3k).
Integrating zebrafish spatial transcriptomics data and imaging data to construct zSTEP
Palette was applied to reconstruct a 3D zebrafish ST atlas (Fig. 3l). However, the ST sections exhibited extrusion and deformation (Fig. 3h and i), resulting in spatial distortions. To generate a 3D ST atlas that enables accurate visualization of gene expression patterns within zebrafish embryos while preserving their comprehensive morphology, we projected the ST spots onto 3D zebrafish embryo imaging data [24]. This approach utilized the detailed morphological representation provided by the 3D imaging, where each cell is assigned a spatial coordinate, serving as a precise reference for the projection of ST spots.
Prior to spot projection, the ST data and the 3D imaging data was initially aligned. We scaled the embryo to similar sizes in both datasets, and selected three spots located at the head, tail and middle of the midline from each dataset. These three paired spots were then utilized for alignment using the Kabsch algorithm [27, 28], which is a method for calculating the optimal rotation matrix that minimizes the root mean squared deviation (RMSD) between two paired sets of spots. This resulted in the alignment between the ST data and the imaging data (Fig. 4a).
Fig. 4.
Projection of ST spots on 3D images and analysis of spatial cell–cell communication. a Diagram showing the overall alignment between the ST data and the imaging data using the Kabsch algorithm. b Diagram indicating the pairing principle for the ST coordinates and the imaging coordinates. In each iteration, the ST spot and the imaging spot closest to each other are paired, which is considered as the optimal solution of this iteration. Paired spots are removed from subsequent iterations, and the loop continues until each ST spot is paired with an imaging spot. c Flow chart showing the process of the pairing. d Lateral view of zSTEP. Each spot is coloured with cell type annotation. e Stacked area plot showing cell proportions in each stage. f Expression patterns of cdx4 on zSTEP. Intensity of colour represents the gene expression levels. g Schematic showing the in silico sections cut for spatial cell–cell communication analysis. h, i Analysis of FGF signalling pathway network in the midline (h) and tail sections (i). Each spot is coloured with cell type annotation. The stroke weights indicate the interaction strength. YSL: Yolk syncytial layer; LPM: Lateral plate mesoderm; Seg P, TB: Segmental plate, Tail bud
Following the alignment, the projection from ST spots to imaging spots was achieved using a loop algorithm inspired by the conception of Greedy algorithm [29] (Fig. 4b and c). The entire process resulted in the spatial gene expression atlas of zebrafish embryos of three developmental stages, which was named zSTEP (Fig. 4d). zSTEP encompassed zebrafish embryos at 10 hpf, 12 hpf and 16 hpf, and these stages corresponded to post-gastrulation and tail elongation processes. By integrating reconstructed ST data with imaging data, zSTEP achieves 3D visualization of cell types and gene expression patterns within a bona fide zebrafish embryo (Fig. 4d-f).
Exploration of spatial cell–cell interactions in zSTEP
zSTEP enables the visualization of gene expression patterns in 3D view of zebrafish embryos, along with their comprehensive morphology (Fig. 4f and Additional file 1: Fig. S4a-d), which allows for the freewheeling selection of specific regions of embryos for spatial cell–cell interaction (CCI) analysis.
We extracted the midline and tail sections from zSTEP and employed CellChat [30, 31] for spatial cell–cell communication analysis (Fig. 4g). At 10 hpf, we observed strong interactions between tail bud/segmental plate cells and notochord cells in both sections, which suggested that the FGF signalling pathway may play a significant role in mediating this interaction (Fig. 4h and i). Additionally, tail bud/segmental plate cells were found to send FGF signals to neural cells, and these cell–cell interactions persisted at 12 hpf and 16 hpf, with tail bud/segmental plate cells continuing to send FGF signals to both notochord and neural cells (Fig. 4h and i). Notably, the strength of FGF signalling from tail bud/segmental plate cells to neural cells increased at 16 hpf. These results indicated that tail bud/segmental plate cells served as a strong FGF signalling centre regulating neighbouring cells, which can be evidenced by the reported roles of FGF signalling in somite development [32–34], caudal spinal cord development [35] and posterior notochord development [34].
Beyond FGF signalling, we also observed that other signalling pathways significantly contributed to the CCIs in the midline and tail sections. Throughout all three developmental stages, we detected strong interactions between neural cells with neighbouring cells through Wnt/β-catenin signalling (Additional file 1: Fig. S4e). Additionally, we found that tail bud/segmental plate cells consistently emitted BMP signals to adjacent cells including notochord, adaxial, and erythroid lineage cells (Additional file 1: Fig. S4f). These findings aligned with prior knowledge indicating that Wnt/β-catenin signalling was involved in regulating the neural plate patterning [36], and BMP signalling was activated in tail bud region, contributing to tail formation [37].
Our spatial cell–cell communication analysis suggests that different morphogens mediated diverse CCIs during embryonic development. The complex cellular networks formed by these CCIs may guide the formation of organ collectives and ensure the robust of organogenesis. Furthermore, recently developed tools that enable 3D CCI analysis, such as Stereopy [38], can also be applied to zSTEP to investigate CCIs within a 3D context, potentially providing deeper insights into spatial cellular interactions during development. In summary, zSTEP proves to be an excellent zebrafish spatial atlas for visualizing gene expression patterns and investigating CCIs in specific regions of the embryo.
Investigating morphogen distributions and cell fate specification in zSTEP
During embryonic development, a group of signalling molecules, known as morphogens diffuse from localized sources, forming concentration gradients that provide spatial information to responding cells and guide their differentiation [39, 40]. The intersections of different morphogens with antiparallel gradients generate diverse cell types, contributing to the formation of precise patterns and structures [41, 42] (Fig. 5a).
Fig. 5.
Morphogen gradients regulate the establishment of AP axis. a Schematic diagram showing that role of antiparallel morphogen gradients in governing cell fate determination. b Schematic diagram showing the linearization of zSTEP. c-f Plots displaying the expression patterns and intensities of representative ligands along the AP axis in FGF (c), RA (d), noncanonical Wnt (e) and TGF-β (f) signalling. The selected ligands show differential expression patterns in Zone1 and Zone2. g Plots of gene expression intensities, cell types and cell type proportions along AP axis. The thick dashed lines in red and blue indicate the expression trends of trunk-enriched and tail-enriched genes; the thin dashed lines sperate Zone1 and Zone2 for GO enrichment analysis. h Enriched GO terms in Zone1 and Zone2 respectively. i Model diagrams showing the relationships between morphogen gradients and cell type specification in Zone1 and Zone2 at different developmental stages. Paraxial Meso: Paraxial mesoderm; LPM: Lateral plate mesoderm; Seg P, TB: Segmental plate, Tail bud; Eryth. Lineage: Erythroid Lineage; ncWnt: noncanonical Wnt
The establishment of the AP axis involves the intricate interactions among morphogen gradients [26, 43, 44]. During tail elongation, morphogen gradients collectively regulate the extension and confinement of the AP axis, resulting in the precise specification and arrangement of tubular organ primordium along the body axis [43, 45, 46]. zSTEP provides an appropriate platform to comprehensively analyse the expressing patterns of morphogens along the AP axis and investigate the relationships between the morphogen gradients and cell type distributions. We linearized the zSTEP (Fig. 5b, see Methods) and focused on ligands involved in canonical Wnt, noncanonical Wnt, Notch, Sonic hedgehog (SHH), RA, FGF, and TGF-β signalling, visualizing their expression intensities along the linearized AP axis (Fig. 5c-f and Additional file 1: Figs. S5-S8). We observed two adjacent regions along the linearized AP axis at all three time points, and each enriched for distinct group of ligands (Fig. 5g). These regions, designated as Zone1 and Zone2, were subjected to Gene Ontology (GO) enrichment analysis using sets of differentially expressed (DE) genes to investigate the functional characteristics of cells within each zone (Fig. 5g and h, Additional file 2: Tabs. S1-S9).
At the end of the gastrulation (10 hpf), tail elongation commenced with various cell types, beginning to be specified along the AP axis (Fig. 5g left). Notably, Zone1 consisted of paraxial mesoderm cells, while Zone2 predominantly comprised segmental plate/tail bud cells. GO enrichment analysis revealed terms related to somite development for both zones, such as “skeletal system”, “somite development”, and “somitogenesis” (Fig. 5h left). Furthermore, Zone2 encompassed the entire tail region, displaying GO terms associated with posterior development, such as “endoderm development” and “mesoderm morphogenesis”.
At 12 hpf and 16 hpf, as tail elongation progressed, more cell types were specified along the AP axis. The boundary between Zone1 and Zone2 shifted posteriorly. Zone1 primarily consisted of trunk region cells, such as somite cells, with GO terms related to muscle development, such as “muscle structure development”, “muscle cell development”, and “skeletal muscle tissue development”. Zone2 continued to predominantly consist of segmental plate/tail bud cells, with GO terms including “somitogenesis”, “somite development”, and “mesenchyme development”, indicating these cells’ high mobility and contribution to somitogenesis and tail elongation (Fig. 5h mid and right). The boundary between Zone1 and Zone2 coincided with the position of somite cells, highlighting the essential roles of antiparallel morphogen gradients in somitogenesis. Additionally, pronephros and erythroid lineage cells were specified at 16 hpf and distributed in both Zone1 and Zone2, with Zone1 containing a higher proportion of pronephros cells and Zone2 exhibiting a higher abundance of erythroid lineage cells (Fig. 5g).
Based on the observed transcriptional morphogen gradients and cell type distributions along AP axis in Zone1 and Zone2 at the three developmental stages, we created a diagram to summarize those findings (Fig. 5i). Assuming that the transcriptional level of a morphogen reflects its activity level, our model demonstrated the presentence of opposing concentration gradients, which could guide the cell type specification along the AP axis. Our analysis showed that Zone1 enriched aldehyde dehydrogenase aldh1a2 (Fig. 5d); while Zone2 showed a high expression of wnt3a and fgf8a (Fig. 5c, Additional file 1: Fig. S5). These observations were consistent with previous studies [47–51] demonstrating the role of anterior RA signalling and posterior Wnt&FGF signalling in establishing the determination front of newly formed somites. In addition to these ligands, Zone2 exhibited enrichment of other FGF ligands, such as fgf10a, fgf4 and fgf13b (Fig. 5g), suggesting their collective roles in regulating zebrafish embryonic posterior development. Interestingly, Zone1 also showed enrichment of certain FGF ligands, including fgf17 and fgf18b (Fig. 5g), suggesting another FGF signalling cascade probably participated in somite development. Moreover, Zone2 enriched the ligands associated with TGF-β signalling, including bmp2b, bmp4, bmp7a and gdf11, aligned with the well-studied roles of BMP signalling in tail development [52–55]. Zone1 and Zone2 also exhibited enrichment of different noncanonical Wnt signalling ligands, wnt11 and wnt5b, respectively, suggesting their important roles in regulating the pattern formation in these zones. Another interesting observation was that the expression of the Notch signalling ligand jag1a shifted from high expression in Zone2 at 10 hpf and 12 hpf to high expression in Zone1, along with dll4, at 16 hpf, suggesting changes in the zones where Notch signalling functions during zebrafish development.
In summary, our work systematically assessed the dynamic transcriptional profiles of morphogens along the AP axis and highlighted the interactions between adjacent zones exhibiting antiparallel morphogen gradients. These findings underscored the crucial roles of these morphogens in orchestrating pattern formation during zebrafish development, laying the foundation for investigating the regulation of AP refinement in further studies.
Identification of key transcriptional regulatory cascades during the AP axis canalization
Diverse morphogen signals intricately interact to instruct the refinement of AP axis, involving cross-regulation of their intracellular pathways and downstream TFs [56, 57]. To identify key morphogens and their downstream TFs that are essential for accurately determining the AP fate of various cell types within the embryo, we employed a random forest model (Fig. 6a), where we considered the expression levels of all morphogens and TFs as variable factors, to identify which ones are crucial for establishing the AP identities. We found that several morphogen ligands from the FGF, Wnt, RA, Notch and BMP signalling pathways as key determinants of AP identities (Fig. 6b, Additional file 1: Fig. S9). The regulatory potential of these morphogens in AP axis formation has been substantiated by previous studies [52–55, 58, 59], reinforcing the validity of our approach.
Fig. 6.
Establishment of AP axis. a Schematic diagram showing the training for the random forest model. b Heatmaps showing the expression intensities of key morphogens and transcription factors along AP axis at 10 hpf, 12 hpf and 16 hpf. c Schematic diagram illustrating the spatial distribution of hox genes in neural and somatic systems. d Schematic diagram showing the calculation of the Hox score of each cell and the assessment of correlations between Hox scores and AP positions. e The correlations between Hox scores and physical AP positions. Colour indicates the spot density, with high in red and low in blue. f Heatmaps showing the expression levels of hoxb genes along AP axis at 10 hpf, 12 hpf and 16 hpf. Intensity of colour represents z-score with high in red and low in blue. g The Hox score distributions in neural system and paraxial mesoderm. Colour scale: Expression intensity (b), Z-score (f)
Interestingly, ligands within the same signalling pathway can exhibit distinct distributions along AP axis, suggesting their dominant roles in refining AP axis at different developmental stages. For instance, fgf3, anteriorly distributed, was a key morphogen at the 10 hpf stage, while fgf17 and fgf8a, located in the posterior trunk and tail regions, respectively, were critical at both the 12 hpf and 16 hpf stages (Fig. 6b). This emphasizes the need to interpret morphogen gradients within a spatiotemporal context.
Among the identified key TFs, the Hox gene family genes emerged as significant regulators, underscoring their vital roles in AP axis regulation (Fig. 6b, Additional file 1: Fig. S10). The Hox genes, a subset of conserved homeobox genes, exhibit both temporal collinearity and spatial collinearity in their expression, allowing them to specify regions along the AP axis and contribute to body plan formation [60–62] (Fig. 6c). To further investigate the relationships between Hox gene expression and AP axis refinement, we defined a “Hox score”, which serves as an estimation for the most probable Hox gene expressed in each spot (Fig. 6d). We performed correlation analysis between Hox scores and physical AP identities across three developmental stages (Fig. 6e). Our results revealed a positive correlation between the Hox score and physical AP identities, with this correlation strengthening as development progressed. This trend was consistently observed in both the neural system and paraxial mesoderm, where Hox genes were independently expressed along the AP axis.
Interestingly, the neural system exhibited a lower Hox score compared to paraxial mesoderm (Fig. 6g), suggesting a “time discrepancy” between these two systems in canalizing their AP “avenue”. These observations highlight the increasingly significant regulatory role of Hox genes in AP axis refinement. Furthermore, we examined the expression patterns of four Hox clusters: hoxa, hoxb, hoxc, and hoxd, along the AP axis (Fig. 6f, Additional file 1: Fig. S11). The hoxb cluster exhibited the most pronounced correlation with the physical AP identities, suggesting that the hoxb family genes may serve as master regulators in refining the AP identities during development.
Discussion
In this study, we present Palette, a pipeline that utilize existing ST data as the only reference to infer precise spatial gene expression patterns from bulk RNA-seq data. The gene expression patterns predicted by Palette exhibit enhanced spatial continuity and improved spatial specificity, closely resembling experimentally observed patterns (Fig. 2d, g, Additional file 1: Fig. S1b, c).
In the Palette pipeline, spot clustering is a crucial step, and an accurate clustering result will enhance the performance of Palette. The traditional UMAP-based clustering does not consider the spatial information, and the intricate arrangement of spots belonging to different clusters may result in sparse expression patterns. Therefore, we employed BayesSpace [25] for spatial clustering, as this method considers both transcriptional similarity and neighbouring structure for clustering, which helps capture local patterns closer to the patterns in bona fide tissues. Furthermore, Palette can incorporate spot characteristics directly from histological images of tissue slices, enabling more accurate spot characterization compared to reliance on spatial clustering alone (Additional file 1: Fig. S12b).
We have assessed the inference performance of Palette using different types of ST references, including Stereo-seq [4] (Additional file 1: Fig. S2), MERFISH [63] (Additional file 1: Fig. S13) and Visium [64] (Additional file 1: Fig. S14). These results indicate that Palette significantly increases the signal-to-noise ratio and exhibits more specific expression patterns. Palette is also applicable to comparative analyses of spatial gene expression patterns in different conditions, as demonstrated using human pancreatic ductal adenocarcinoma (PDAC) data [65]. Here, Palette inferred spatial gene expression patterns from bulk RNA-seq datasets [66] of normal and tumour tissue slices, revealing a notable decrease in tumour-specific gene expression in the normal tissue slice (Additional file 1: Fig. S12). Ongoing research aims to expand Palette’s application scenarios and explore further possibilities for analysing and interpreting spatial gene expression patterns.
Leveraging the capabilities of Palette, we reconstructed a comprehensive spatial gene expression atlas, zSTEP, by integrating transcriptomics from serial sections and 3D images of zebrafish embryos [24]. zSTEP not only facilitates the visualization of gene expression patterns in 3D morphology of zebrafish embryos, but also allowed for the flexible selection of sections for spatial CCI analyses. As a 3D spatial gene expression atlas, zSTEP holds great potential for studying the intricate 3D spatial CCIs during zebrafish development.
We utilized a linearized version of zSTEP to investigate the relationships between morphogen distributions and cell type specification along the AP axis during development. We identified two adjacent zones with antiparallel morphogen gradients, with the boundary of these two zones appearing to act as the determinant front for somite formation.
In addition, by employing a random forest model, we explored the correlations between morphogen/TF expression and the developing AP patterns. This analysis identified critical morphogens and downstream TFs essential for determining AP position at different developmental stages. Notably, the Hox family genes were identified as dominant TFs, with strong correlations between the expression patterns of hox genes and the cell physical AP positions. Importantly, our findings suggest that hoxb cluster likely plays a more significant role in AP axis formation compared to other hox clusters.
During the development of zSTEP, we encountered several limitations that warrant future improvements. Firstly, the manual adjustment and alignment of Stereo-seq slices during 3D ST data construction were labour-intensive and introduce potential bias. Although tools like PASTE [67, 68] were employed, their performance was unsatisfactory (Additional file 1: Fig. S15), possibly due to the hollow circle shape of ST slices leading to tilted alignments. Newly developed tools such as Spateo [69] can achieve good alignment performance (Additional file 1: Fig. S15) and may be worth exploring for assisting the alignment in future studies, although our manual alignment also achieved the desired performance (Additional file 1: Fig. S15). Users can flexibly choose alternative alignment strategies according to their specific needs. Secondly, the Palette algorithm currently excluded genes not detected in any spot of the ST data. Leveraging serial bulk RNA-seq data, could enable the construction of gene co-expression networks along the cutting direction. This approach has the potential to assist in predicting the expression patterns in ST data. Thirdly, the performance of Palette and zSTEP heavily relied on the quality of ST data. If the ST data is not of sufficient quality, the low-expression genes may not be detected or only appear in very few scattered spots, and the performance of spot clustering could also be affected. Moreover, although zSTEP shows great potential in investigating AP patterning, the performance in assessing left–right (LR) and dorsal–ventral (DV) patterning remains insufficient. Specifically, the Stereo-seq data of 12 hpf zebrafish embryo had fewer slices on the right side (Additional file 1: Fig. S3b), resulting in more blank spots in the right part of zSTEP for the 12 hpf embryo. Additionally, the original annotation of the ST data does not clearly distinguish all the cell types along DV axis, and stacking ST slices can lead to disordered cell positions, further diminishing the performance for DV patterning. However, with the ongoing advancements in spatial resolution and data quality, the performance of Palette and zSTEP is expected to be enhanced, unlocking even greater potential for analysing spatiotemporal gene expression.
On the other hand, compared to the pioneer strategy that infers spatial information of scRNA-seq data from well-established genes [6, 70], our Palette pipeline cannot achieve single cell resolution. However, our Palette pipeline is based on the ST reference, and thus preserves the real positional relationships between spots. Furthermore, the focus of our pipeline is to infer the gene expression patterns, which are closely correlated to biological functions and critical for embryonic development, rather than attempting to capture the sparse expression within individual spots. In this regard, our Palette pipeline can be advantageous, as it allows for reconstruction of the major expression profiles, which are often more relevant for understanding developmental processes. Additionally, our Palette can be applied to serial sections, enabling the construction of 3D ST atlas.
Finally, while the current analyses demonstrated that zSTEP can serve as a valuable tool for identifying genes having specific patterns at certain developmental stages, the exploration of zSTEP is still limited. During animal development, pattern formation is always one of the most important developmental issues. As demonstrated by the reaction–diffusion (RD) model, morphogen molecules are produced at specific regions of the embryo, forming morphogen gradients to guide cell specification, while interactions between different morphogens instruct more complicated and well-choreographed pattern formation [71]. Our Palette constructed zSTEP, as a comprehensive transcriptomic expression pattern during development, could be leveraged to evaluate the RD model during development, assisting in the assessment of each morphogen gradient, interactions among morphogen gradients, and the relationship between morphogen gradients and cell type specification. Moreover, the investigation of gene expression patterns should not be limited to morphogens and TFs, and further investigation of their roles in AP patterning is desirable. Additionally, here a random forest model may be sufficient for investigating the most essential morphogens and TFs for AP axis refinement, while more sophisticated machine learning models may be required for addressing more specific biological questions. Meanwhile, a recently posted study also generate a 3D zebrafish atlas by integrating 3D weMERFISH transcriptomics data with single-cell multiomics data, providing another valuable tool for investigating embryonic development and enabling the assessment of both gene expression and chromatin accessibility [72]. We believe that these useful tools will further facilitate our in-depth exploration of the spatiotemporal regulatory mechanisms underlying early embryonic development in vertebrates.
Conclusions
In our work, we developed Palette that infers detailed spatial gene expression patterns from bulk RNA-seq data with ST data as the only reference. By smoothing, imputing and adjusting gene expressions, Palette achieves more precise expression patterns. Since gene expression patterns are closely correlated to biological functions, Palette serves as a valuable tool for investigating the relationships between the interplay of gene expression patterns and biological functions.
Leveraging the advantages of Palette, we reconstructed the zSTEP by integrating 53-slice serial bulk RNA-seq data from three developmental stages with existing ST references and 3D zebrafish embryo images. zSTEP provides a comprehensive cartographic resource for examining gene expression and spatial CCIs within zebrafish embryos. Utilizing machine learning-based screening, we identified key morphogens and TFs essential for AP axis development and characterized their dynamic distribution throughout embryogenesis.
Methods
The Palette algorithm
ST data was used as the reference for inferring spatial gene expression from bulk RNA-seq data. The input bulk expression matrix is S ∈ Rn×1, which contains the expression information of n genes. BayesSpace [25] was first employed to perform spatial clustering on the ST data using the genes highly expressed in both the ST data and the bulk data. The criteria for selecting highly expressed, stable genes are demonstrated in Additional file 1: Fig. S16. Through spatial clustering, m clusters are identified, with the average expression of each cluster represented as C ∈ Rn×m. MuSiC [19] was then employed to obtain the proportions of each defined cluster in the bulk data, denoted as A ∈ Rm×1, through deconvolution. A pseudo bulk vector, P ∈ Rn×1, is constructed by taking the cross product of the cluster expression matrix of ST data and the cluster abundances of bulk data.
Each gene is assigned with a variable factor to adjust its expression. The variable factor vector, K ∈ Rn×1, can be calculated as the ratio of the input bulk to the pseudo bulk vector.
The adjusted matrix M ∈ Rn×m is generated by the dot product of the cluster expression matrix of the ST slice and the variable factor.
The pseudo bulk data of the reference ST slice is T ∈ Rn×1. One random spot and its nearest neighbouring spots are selected, and the expression of spots belonging to the cluster i is aggregated to form a pseudo-cluster expression data LST, L ∈ Rn×1. The regional cluster factor R ∈ Rn×1 is defined as the proportion of LST in the entire ST slice.
The ratio of LST to the reference ST data is equal to the ratio of the adjusted LST to the adjusted matrix, and thus the adjusted LST can be calculated by the dot product of the expression matrix of cluster i in the adjusted matrix Mi and the regional cluster factor R. The expression of each spot in cluster i, D ∈ Rn×1, is achieved through evenly allocation of the adjusted LST. N is the numbers of spots belonging to cluster i in this region.
The average expression ∈ Rn×1 after multiple iterations is considered as the estimated expression of the spot. Here p means that the spot has been selected for p times during iterations.
The impacts of the number of detected genes and spatial resolution have been addressed in Additional file 1: Fig. S17. The computer configurations for running Palette and the processing speeds under varying data conditions are listed in Additional file 1: Tab. S10. Notably, the weak, yet real expression might be removed in Palette, as it could result from the ST technique itself or the variations in samples and may not carry biological significance. Palette is designed to capture the expression patterns that are typically closely correlated with functions, taking into account spot characteristics and emphasizing cluster-specific expression, thereby yielding spatial-specific expression patterns.
Palette performance assessment on Drosophila slices
Two consecutive slices (referred to slices 4 and 5) were taken from the Stereo-seq data [4] of Drosophila E14-16 serial sections. The data was converted into Seurat [73] objectives for the following analyses. Given their adjacency, these two slices should exhibit similar gene expression levels and patterns. Slice 5 was then converted into a pseudo bulk and used as Palette's input, with slice 4 serving as the ST reference. The predicted spatial expression patterns of slice 5 were compared to the actual ST data from slice and ISH images from BDGP database (www.fruitfly.org) for evaluating Palette’s performance. The intensity plot profiles along AP axis were generated through the following steps: The expression pattern plot images or in situ hybridization images were imported into ImageJ and converted to grayscale. The colour was then inverted, and a line of a certain width (here set as 10) was drawn across from the anterior part to the posterior part (Additional file 1: Fig. S1a). The signal intensities along the width of the line were measured and imported into R for generating intensity plots. The Adjusted Rand Index (ARI) and Root Mean Square Error (RMSE) were used to evaluate the similarity of the expression patterns. The expression patterns of in situ hybridization images were considered as the expected values, and the expression patterns of ST data and inferred expression patterns were compared to the expected values respectively. Common positions along the AP axis within all three expression profiles were used, and the RMSE were calculated based on the scaled intensity of these positions. Values greater than the threshold were set to 1; otherwise, they were set to 0, and the ARI was then calculated based on the intensity category. Higher ARI and lower RMSE indicate greater similarity.
Palette performance assessment on zebrafish slices
One middle slice of zebrafish Stereo-seq data was selected as the ST reference [2]. A corresponded slice was selected as an input for Palette from the serial bulk RNA-seq data [1] of zebrafish embryo using a correlation test. The correlation was calculated based on the expression of genes showing differential expression along the DV axis. The predicted expression patterns were compared to the ISH images from ZFIN (www.zfin.org) and published data [6, 26] for evaluating Palette’s performance.
Sample preparation for bulk RNA-seq
Live embryos at the required developmental stages of 10 hpf, 12 hpf and 16 hpf were rapidly embedded in optimal cutting temperature (OCT) compound and oriented in the bottom of steel embedding cassettes. Embedded embryos were rapidly frozen at −80 °C for 10 min, and then transferred into a cryostat (Leica) at −20 °C. In the cryostat, embedded embryos were removed from the steel embedding cassettes and cryosectioned at a thickness of 20 µm. Each slice was collected and placed on the PEN membrane slides (Leica) in the correct order. Membrane slides were stained using 1% (wt/vol) cresyl violet (dissolved in 70% ethanol) to roughly check cell number and ensure slice integrity. Each slice was extracted from membrane slides through laser capture microdissection system (Leica LMD6) and collected into a 1.5 mL Eppendorf tube containing 40 µL of PicoPure™ lysis buffer (Thermo Fisher). Collected samples were incubated at 42 °C for 30 min and then sent to Shanghai Ouyi Biology Medical Science Technology Co., Ltd. (Shanghai, China) for RNA extraction, cDNA library construction and sequencing. Paired-end sequencing at 150 bp read length was performed on a Novaseq 6000 instrument.
Bulk RNA-seq data processing
Reads were aligned to the Danio rerio genome Ensembl Release 92 (GRCz11) using STAR v2.7.1a [74]. The aligned reads were assigned to each gene using featureCounts v1.6.0 [75]. For each embryo, the gene counts of each slice were merged into a count matrix, and the genes that received more than 0.5 counts per million reads (CPM) in at least 3 slices were retained. The count matrices with slice position information were constructed into DGEList objects using edgeR [76–78] for the following analysis.
3D embryo reconstruction
The spatiotemporal transcriptomics date used for 3D embryo reconstruction was obtained from the zebrafish Stereo-seq dataset [2] available for download at https://db.cngb.org/stomics/zesta/download/. Each section was fitted into a 2D coordinate system, with the section centre serving as the origin. Before the reconstruction, severely broken slices and outlier spots were removed from the dataset. The position of each section on the 2D coordinate system was manually adjusted and aligned based on the section shapes and spot annotations. Furthermore, the distance between neighbouring sections on the z-axis was estimated, and the corresponding z-axis coordinates were assigned to each section. By combining the spatial transcriptomics data and with the 3D coordinates, reconstructed embryos were generated. In addition to manual adjustment for alignment, PASTE [67, 68] and Spateo [69] were also tested for alignment performance. Using both spatial and gene expression information from the ST data, slice alignments were performed with the morpho_align function in Spateo and the stack_slices_pairwise function in PASTE. The percentages of overlapping regions and overlapping regions with the same cluster were then calculated, and ARI was used to assess the alignment performance. Spateo and manual adjustment exhibited similar and better alignment performance, with more overlapping regions and overlapping regions with the same cluster (Additional file 1: Fig. S15). Since manual adjustment provided good alignment performance and we aimed to avoid embedding too many external tools into the analyses, manual adjustment was ultimately adopted for alignment.
Alignment between bulk RNA-seq slices and re-segmented pseudo ST slices, followed by Palette implementation
Reconstructed embryos were rotated along x-axis and y-axis separately, and then re-segmented into virtually sectioned slices. Each of re-segmented slices was transformed into pseudo bulk data, which represented an aggregate expression profile for that slice. To determine embryo orientation, several genes exhibiting AP differential expression patterns were selected. The scaled expression levels of these genes across slices were generated for both bulk RNA-seq slices and pseudo bulk slices. To assess the alignment between bulk RNA-seq slices and pseudo bulk slices, Pearson correlation coefficients were calculated using the full expression profiles of these genes across slices. The alignment with the highest mean Pearson correlation coefficient along the slices was considered as the optimal alignment. Palette was then applied to the aligned slice pairs using the overlapped genes. For the 10 hpf embryo, there were 24,658 genes in the bulk data, 18,698 genes in the Stereo-seq data, and 16,601 overlapped genes. For the 12 hpf embryo, there were 23,018 genes in the bulk data, 18,948 genes in the Stereo-seq data, and 16,401 overlapped genes. For the 16 hpf embryo, there were 24,357 genes in the bulk data, 23,110 genes in the Stereo-seq data, and 19,539 overlapped genes.
3D projection of ST spots to imaging spots
The live imaging data of zebrafish embryos were obtained from the study by Shah et al., 2019 [24] available for download at https://idr.openmicroscopy.org under accession code idr0068 [79]. Both the centres of the ST data and the live imaging data were set as the origin, and the embryo size in the imaging data was scaled to a similar embryo size in the ST data. Three specific spots (head, mid of midline and tail) were selected from the both datasets. The Kabsch algorithm [27, 28] was used to achieve the optimal alignment between the two paired sets using these three spot pairs. This algorithm determines the optimal rotation matrix through singular value decomposition (SVD). Using this rotation matrix, the coordinates of the three ST spots were rotated and transformed to align with the corresponding spots in the live imaging data. By applying the rotation matrix obtained from the optimal alignment, the entire set of the ST spot coordinates was transformed. A transform matrix was obtained by calculating the difference between the coordinate of the head spot achieved from the spot alignment and the coordinate of the head spot after applying the rotation. The transform matrix was then applied to the coordinates of all the ST spots, which had undergone rotation. Through this process, the alignment between the ST data and the imaging data was achieved.
The pairing between the ST spots and the imaging spots was achieved through a looping algorithm based on the Greedy algorithm [29]. In each iteration of the loop, the ST spot and the imaging spot that were closest to each other were paired, which was considered as the optimal solution of the iteration. Paired spots were removed from subsequent iterations, while unpaired spots moved on to the next loop. The looping process continued until each ST spot was paired with an imaging spot. The expression information from the ST spots was then assigned to their corresponding imaging spots. The remaining unpaired imaging spots were retained to preserve the overall morphology of the embryo. The numbers of mapped spots for the 10 hpf, 12 hpf and 16 hpf embryos are 15,379 (69.4% of the total spots), 14,697 (70.5% of the total spots) and 21,605 (77.2% of the total spots), respectively.
Spatial cell–cell communication analysis using CellChat
CellChat [30, 31] was employed to analyse spatial cell–cell communication based on prior known zebrafish ligand-receptor interaction database CellChatDB. The section of interest was extracted from zSTEP for analysis, which provided 2D section ST data. In the Stereo-seq data, each spot contained 15 × 15 DNA nanoball (DNB) spots (The diameter of each spot is near 10 μm). Consequently, in the section ST data, the spot diameter was set as 15, and the number of pixels spanning the spot size diameter was set as 225. The expression data of section ST data was pre-processed to identify over-expressed ligands and receptors for each cell group. Setting distance as constraints, CellChat inferred communication probability between two interacting cell groups. This inference was based on the average gene expression of a ligand in one cell group and the average gene expression of a receptor in another cell group. The communication probabilities of all ligands-receptors interactions associated with each pathway were summarized for analysis of the communication probabilities within in signalling pathways.
Linearizing the AP axis in zSTEP
The lateral view of zSTEP was projected onto a 2D plane. The spots were fitted into a cycle. Each spot was then projected onto the cycle, with the projected spot representing the closest spot on the cycle to the original spot. By designating the most anterior spot as the origin, the AP position of each spot was determined by calculating the arc length from its projected spot to the origin. The AP value of each spot was then divided by the maximum AP value in the dataset to achieve the normalized AP value.
Employing random forest model for prediction
For each cell, the normalized expression of morphogens or TFs was set as variable factors, while the cell’s normalized physical AP position was set as observed factor. We took 70% of the data to train a random forest model using the randomForest [80] package. The importance of the variables to AP position was assessed by both increase in mean square error (IncMSE) and increase in node purity (IncNodePurity) [81]. Cross-validation was used to evaluate the number of variables. The top 6 important variables were selected, and the correlations between their expression and AP positions were visualized.
Calculating Hox score of each spot
For each hox gene, its expression in each spot was divided by the maximum expression of that gene in the dataset, indicating the expression probability of that gene in that spot. Then, the expression of hox genes in spots can be converted to a repeated representation, where the number of repetitions corresponded to the expression probability of the gene in that spot. In our analysis, we made an assumption that the expression of hox genes in each spot followed a normal distribution. This assumption enabled generating a fitting curve of the normal distribution on the density plot of the hox genes. The Hox score was determined as the hox value at the peak of the normal distribution. The Hox score of each spot was then divided by thirteen, resulting in the normalized Hox score.
In situ hybridization (ISH)
ISH was performed following the published protocol [82, 83]. Embryos of required developmental stages were fixed in 4% PFA/PBS overnight at 4 °C, and then transferred into 100% methanol (MeOH) for dehydration overnight at −20 °C. Embryos were washed through 75%, 50% and 25% MeOH/PBST for 5 min each at room temperature and then three times for 5 min in PBST. Embryos older than 10 hpf were treated with proteinase K (10 μg/mL in PBST) for 30 s and then fixed in 4% PFA/PBS for 20 min at room temperature. Proteinase K treated embryos were washed four times for 5 min each in PBST. Embryos were transferred into Hybridization Mix (HM) and incubated at 70 °C for 2–5 h, and then the buffer was replaced by HM containing digoxigenin-11-UTP (Sigma-Aldrich, 11,277,073,910) labelled probe. After overnight incubation at 70 °C, embryos were washed through 75%, 50% and 25% HM/2xSSC at 70 °C for 20 min each. Embryos were then washed in 2 × SSC at 70 °C for 20 min and washed in 0.2 × SSC twice at 70 °C for 40 min. 0.2 × SSC were then progressively replaced by PBST at room temperature. Embryos were blocked in blocking buffer at 4 °C for 3 h, and incubated in blocking buffer with anti-DIG-AP antibody (1:10,000 dilution, Sigma-Aldrich, 11,093,274,910) at 4 °C overnight on a low-speed shaker. Embryos were washed 6 times for 15 min each in PBST to remove excess antibodies. Embryos were stained in NBT/BCIP staining solution, and the staining was stopped by washing twice in PBST when the expected staining patterns were observed.
Supplementary Information
Additional file 1: Fig. S1. Evaluation of Palette performance. Fig. S2. Cresyl violet staining of zebrafish serial cryosections. Fig. S3. Adjustment of Stereo-seq slices for 3D ST data construction. Fig. S4. Application of zSTEP for visualizing gene expression patterns and analysing spatial cell-cell communication. Figs. S5-S8. Wnt ligand, Wnt antagonist, Sonic hedgehog ligand, Notch ligand, FGF ligand and TGF-β ligand distributions along AP axis. Figs. S9-S10. Investigation of the key morphogens and key TFs for determining cell AP positions using the random forest model. Fig. S11. hox gene distributions along AP axis. Fig. S12. Inferring spatial gene expression patterns from human PDAC true bulk data. Fig. S13. Inferring spatial gene expression patterns from mouse hypothalamus true bulk data using MERFISH data as the ST reference. Fig. S14. Inferring spatial gene expression patterns from melanoma pseudo bulk data using Visium data as the ST reference. Fig. S15. Comparisons of alignment performance among PASTE, Spateo and manual alignment. Fig. S16. Selection of highly expressed, stable genes. Fig. S17. Assessing the effects of the number of detected genes and spatial resolution on Palette performance. Tab. S10. Computational configurations and processing speeds under varying data conditions.
Additional file 2: Tab. S1. DE genes in Zone1 and Zone2at 10 hpf. Tab. S2. Enriched GO terms in Zone1 at 10 hpf. Tab. S3. Enriched GO terms in Zone2 at 10 hpf. Tab. S4. DE genes in Zone1 and Zone2at 12 hpf. Tab. S5. Enriched GO terms in Zone1 at 12 hpf. Tab. S6. Enriched GO terms in Zone2 at 12 hpf. Tab. S7. DE genes in Zone1 and Zone2at 16 hpf. Tab. S8. Enriched GO terms in Zone1 at 16 hpf. Tab. S9. Enriched GO terms in Zone2 at 16 hpf.
Acknowledgements
We thank Dr. Si-Yu He at Stanford University and the members of Laboratory of Development and Organogenesis (LDO) at Zhejiang University for helpful suggestions and discussions.
Review history
This article was first peer reviewed at Review Commons and reviewer reports as well as the authors’ point-by-point response are available online [94–97]. The rest of the review history containing the authors' responses and additional reviewer comments is available as Additional file 3.
Peer review information
Veronique van den Berghe was the primary editor of this article at Genome Biology and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Authors’ contributions
P.-F.X., Y.D., T.C., X.L., J.L. and X.F. conceived and designed the research; Y.D., T.C. and J.L. designed Palette pipeline, constructed zSTEP and performed analyses; X.L., X.-X.F., Y.H., X.-F.Y., L.-E.Y., H.-R.L. and Z.-W.B. performed experiments; Y.D. and T.C. drafted the manuscript; N.J., J.L., X. F. and P.-F.X. edited the manuscript; all authors reviewed and approved the manuscript. P.-F.X. and X.F. supervised this study. P.-F.X., Y.D., T.C., J.L. and X.F. provided fundings for the study. All authors read and approved the final manuscript.
Funding
This work was supported by grants from National Key Research and Development Program of China (2024YFA1803001), the National Natural Science Foundation of China (32300677, 32050109, 32300688, 82522092, U23A20513), the "Pioneer" and "Leading Goose" R&D Program of Zhejiang (2024C03106), and Ningbo Top Medical and Health Research Program (No. 2022030309).
Data availability
The raw data of serial bulk RNA-seq has been deposited to the Gene Expression Omnibus (GEO) under accession number “GSE262578” [84]. The published data used in this study can be accessed through the following links or accession number: (1) Stereo-seq data of Drosophila embryos (https://db.cngb.org/stomics/flysta3d/download/); [4, 85]; (2) Stereo-seq data of zebrafish embryos (https://db.cngb.org/stomics/zesta/download/); [2, 86]; (3) Serial bulk RNA-seq data of 6-hpf zebrafish embryos: “GSE59873” [1, 87]; (4) live imaging data of zebrafish embryos (idr0068 from https://idr.openmicroscopy.org) [24, 79]; (5) Spatial transcriptomics data of human PDAC: GEO accession: “GSE111672” [65, 88]; (6) Bulk RNA-seq data of human PDAC: GEO accession: “GSE171485” [66, 89]; (7) MERFISH data of mouse hypothalamus: GEO accession: “GSE113576” [63, 90]; (8) Bulk RNA-seq data of mouse hypothalamus region: GEO accession: “GSE192999” [21, 91]; (9) Visium data of melanoma tissues are available at https://www.spatialresearch.org/resources-published-datasets/doi-10-1158-0008-5472-can-18-0747/ [64].
The codes for Palette pipeline and bioinformatics analyses are deposited on GitHub (https://github.com/ldo2zju/zSTEP) under the GPL-3.0 license [92], and is also deposited to Zenodo with a DOI of https://zenodo.org/records/13952256 [93].
Declarations
Ethics approval and consent to participate
Wild-type zebrafish strain was maintained following the standard procedures, and the experimental procedures were approved by the Institutional Review Board of Zhejiang University. The approval number is ZJU20220375.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yang Dong, Tao Cheng and Xiang Liu contributed equally to this work and should be regarded as joint first authors.
Contributor Information
Jie Liao, Email: liaojie@zju.edu.cn.
Xiaohui Fan, Email: fanxh@zju.edu.cn.
Peng-Fei Xu, Email: pengfei_xu@zju.edu.cn.
References
- 1.Junker JP, Noel ES, Guryev V, Peterson KA, Shah G, Huisken J, et al. Genome-wide RNA tomography in the zebrafish embryo. Cell. 2014;159:662–75. [DOI] [PubMed] [Google Scholar]
- 2.Liu C, Li R, Li Y, Lin X, Zhao K, Liu Q, et al. Spatiotemporal mapping of gene expression landscapes and developmental trajectories during zebrafish embryogenesis. Dev Cell. 2022;57(1284–1298):e1285. [DOI] [PubMed] [Google Scholar]
- 3.Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;185(1777–1792):e1721. [DOI] [PubMed] [Google Scholar]
- 4.Wang M, Hu Q, Lv T, Wang Y, Lan Q, Xiang R, et al. High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae. Dev Cell. 2022;57(1271–1283):e1274. [DOI] [PubMed] [Google Scholar]
- 5.Shi H, He Y, Zhou Y, Huang J, Maher K, Wang B, et al. Spatial atlas of the mouse central nervous system at molecular resolution. Nature. 2023;622:552–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Nichterwitz S, Benitez JA, Hoogstraaten R, Deng Q, Hedlund E. LCM-Seq: a method for spatial transcriptomic profiling using laser capture microdissection coupled with polyA-based RNA sequencing. Methods Mol Biol. 2018;1649:95–110. [DOI] [PubMed] [Google Scholar]
- 8.Guo W, Hu Y, Qian J, Zhu L, Cheng J, Liao J, et al. Laser capture microdissection for biomedical research: towards high-throughput, multi-omics, and single-cell resolution. J Genet Genomics. 2023;50:641–51. [DOI] [PubMed] [Google Scholar]
- 9.Chen J, Suo S, Tam PP, Han JJ, Peng G, Jing N. Spatial transcriptomic analysis of cryosectioned tissue samples with Geo-seq. Nat Protoc. 2017;12:566–80. [DOI] [PubMed] [Google Scholar]
- 10.Wang R, Peng G, Tam PPL, Jing N: Integration of computational analysis and spatial transcriptomics in single-cell study. Genomics Proteomics Bioinformatics 2022. [DOI] [PMC free article] [PubMed]
- 11.Lubeck E, Coskun AF, Zhiyentayev T, Ahmad M, Cai L. Single-cell in situ RNA profiling by sequential hybridization. Nat Methods. 2014;11:360–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Xia C, Fan J, Emanuel G, Hao J, Zhuang X. Spatial transcriptome profiling by MERFISH reveals subcellular RNA compartmentalization and cell cycle-dependent gene expression. Proc Natl Acad Sci U S A. 2019;116:19490–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Liu J, Tran V, Vemuri VNP, Byrne A, Borja M, Kim YJ, et al. Concordance of MERFISH spatial transcriptomics with bulk and single-cell RNA sequencing. Life Sci Alliance. 2023. 10.26508/lsa.202201701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363:1463. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Wu SZ, Al-Eryani G, Roden DL, Junankar S, Harvey K, Andersson A, et al. A single-cell and spatially resolved atlas of human breast cancers. Nat Genet. 2021;53:1334–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fawkner-Corbett D, Antanaviciute A, Parikh K, Jagielowicz M, Geros AS, Gupta T, et al. Spatiotemporal analysis of human intestinal development at single-cell resolution. Cell. 2021;184:810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol. 2022;40:517–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li B, Severson E, Pignon JC, Zhao H, Li T, Novak J, et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol. 2016;17:174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wang X, Park J, Susztak K, Zhang NR, Li M. Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat Commun. 2019;10:380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tsoucas D, Dong R, Chen H, Zhu Q, Guo G, Yuan GC. Accurate estimation of cell-type composition from gene expression data. Nat Commun. 2019;10:2975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liao J, Qian JY, Fang Y, Chen Z, Zhuang X, Zhang NY, et al. De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution. Nat Commun. 2022. 10.1038/s41467-022-34271-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Xu H, Wang S, Fang M, Luo S, Chen C, Wan S, et al. Spacel: deep learning-based characterization of spatial transcriptome architectures. Nat Commun. 2023;14:7603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Baul S, Tanvir Ahmed K, Jiang Q, Wang G, Li Q, Yong J, et al. Integrating spatial transcriptomics and bulk RNA-seq: predicting gene expression with enhanced resolution through graph attention networks. Brief Bioinform. 2024. 10.1093/bib/bbae316. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shah G, Thierbach K, Schmid B, Waschke J, Reade A, Hlawitschka M, et al. Multi-scale imaging and analysis identify pan-embryo cell dynamics of germlayer formation in zebrafish. Nat Commun. 2019;10:5753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;39:1375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Cheng T, Xing YY, Liu C, Li YF, Huang Y, Liu X, et al. Nodal coordinates the anterior-posterior patterning of germ layers and induces head formation in zebrafish explants. Cell Rep. 2023;42:112351. [DOI] [PubMed] [Google Scholar]
- 27.Kabsch W. A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallogr Sect A. 1978;34:827–8. [Google Scholar]
- 28.Kabsch W. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr Sect A. 1976;32:922–3. [Google Scholar]
- 29.Jungnickel D. The Greedy Algorithm. In: Graphs, Networks and Algorithms. Berlin, Heidelberg: Springer Berlin Heidelberg; 1999. p. 129–53.
- 30.Jin SQ, Guerrero-Juarez CF, Zhang LH, Chang I, Ramos R, Kuan CH, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun. 2021. 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jin S, Plikus MV, Nie Q. CellChat for systematic analysis of cell-cell communication from single-cell and spatially resolved transcriptomics. bioRxiv. 2023:2023.2011.2005.565674.
- 32.Groves JA, Hammond CL, Hughes SM. Fgf8 drives myogenic progression of a novel lateral fast muscle fibre population in zebrafish. Development. 2005;132:4211–22. [DOI] [PubMed] [Google Scholar]
- 33.Reifers F, Bohli H, Walsh EC, Crossley PH, Stainier DY, Brand M. Fgf8 is mutated in zebrafish acerebellar (ace) mutants and is required for maintenance of midbrain-hindbrain boundary development and somitogenesis. Development. 1998;125:2381–95. [DOI] [PubMed] [Google Scholar]
- 34.Draper BW, Stock DW, Kimmel CB. Zebrafish fgf24 functions with fgf8 to promote posterior mesodermal development. Development. 2003;130:4639–54. [DOI] [PubMed] [Google Scholar]
- 35.del Corral RD, Morales AV: The Multiple Roles of FGF Signaling in the Developing Spinal Cord. Frontiers in Cell and Developmental Biology 2017, 5. [DOI] [PMC free article] [PubMed]
- 36.Green DG, Whitener AE, Mohanty S, Mistretta B, Gunaratne P, Yeh AT, et al. Wnt signaling regulates neural plate patterning in distinct temporal phases with dynamic transcriptional outputs. Dev Biol. 2020;462:152–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yang Y, Thorpe C. BMP and non-canonical Wnt signaling are required for inhibition of secondary tail formation in zebrafish. Development. 2011;138:2601–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Fang S, Xu M, Cao L, Liu X, Bezulj M, Tan L, et al. Stereopy: modeling comparative and spatiotemporal cellular heterogeneity via multi-sample spatial transcriptomics. Nat Commun. 2025;16:3741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Gurdon JB, Bourillot PY. Morphogen gradient interpretation. Nature. 2001;413:797–803. [DOI] [PubMed] [Google Scholar]
- 40.Stathopoulos A, Iber D. Studies of morphogens: keep calm and carry on. Development. 2013;140:4119–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Grant PK, Szep G, Patange O, Halatek J, Coppard V, Csikasz-Nagy A, et al. Interpretation of morphogen gradients by a synthetic bistable circuit. Nat Commun. 2020;11:5545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zagorski M, Tabata Y, Brandenberg N, Lutolf MP, Tkacik G, Bollenbach T, et al. Decoding of position in the developing neural tube from antiparallel morphogen gradients. Science. 2017;356:1379–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kimelman D, Martin BL. Anterior-posterior patterning in early development: three strategies. Wiley Interdiscip Rev Dev Biol. 2012;1:253–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Schilling TF. Anterior-posterior patterning and segmentation of the vertebrate head. Integr Comp Biol. 2008;48:658–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Anand GM, Megale HC, Murphy SH, Weis T, Lin Z, He Y, et al. Controlling organoid symmetry breaking uncovers an excitable system underlying human axial elongation. Cell. 2023;186(497–512):e423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mongera A, Michaut A, Guillot C, Xiong F, Pourquie O. Mechanics of anteroposterior axis formation in vertebrates. Annu Rev Cell Dev Biol. 2019;35:259–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Mallo M. Revisiting the involvement of signaling gradients in somitogenesis. FEBS J. 2016;283:1430–7. [DOI] [PubMed] [Google Scholar]
- 48.del Diez Corral R, Olivera-Martinez I, Goriely A, Gale E, Maden M, Storey K. Opposing FGF and retinoid pathways control ventral neural pattern, neuronal differentiation, and segmentation during body axis extension. Neuron. 2003;40:65–79. [DOI] [PubMed] [Google Scholar]
- 49.Simsek MF, Ozbudak EM. Spatial Fold Change of FGF Signaling Encodes Positional Information for Segmental Determination in Zebrafish. Cell Rep. 2018;24(66–78):e68. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Zhang W, Scerbo P, Delagrange M, Candat V, Mayr V, Vriz S, et al. Fgf8 dynamics and critical slowing down may account for the temperature independence of somitogenesis. Commun Biol. 2022;5:113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Bernheim S, Meilhac SM. Mesoderm patterning by a dynamic gradient of retinoic acid signalling. Philos Trans R Soc Lond B Biol Sci. 2020;375:20190556. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Pyati UJ, Webb AE, Kimelman D. Transgenic zebrafish reveal stage-specific roles for Bmp signaling in ventral and posterior mesoderm development. Development. 2005;132:2333–43. [DOI] [PubMed] [Google Scholar]
- 53.Beck CW, Whitman M, Slack JM. The role of BMP signaling in outgrowth and patterning of the Xenopus tail bud. Dev Biol. 2001;238:303–14. [DOI] [PubMed] [Google Scholar]
- 54.Gonzalez EM, Fekany-Lee K, Carmany-Rampey A, Erter C, Topczewski J, Wright CV, et al. Head and trunk in zebrafish arise via coinhibition of BMP signaling by bozozok and chordino. Genes Dev. 2000;14:3087–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Sharma R, Shafer MER, Bareke E, Tremblay M, Majewski J, Bouchard M. Bmp signaling maintains a mesoderm progenitor cell state in the mouse tailbud. Development. 2017;144:2982–93. [DOI] [PubMed] [Google Scholar]
- 56.Takebayashi-Suzuki K, Suzuki A. Intracellular communication among morphogen signaling pathways during vertebrate body plan formation. Genes. 2020. 10.3390/genes11030341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Briscoe J, Small S. Morphogen rules: design principles of gradient-mediated embryo patterning. Development. 2015;142:3996–4009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Boylan M, Anderson MJ, Ornitz DM, Lewandoski M: The Fgf8 subfamily (Fgf8, Fgf17 and Fgf18) is required for closure of the embryonic ventral body wall. Development 2020, 147. [DOI] [PMC free article] [PubMed]
- 59.Cao Y, Zhao J, Sun Z, Zhao Z, Postlethwait J, Meng A. Fgf17b, a novel member of Fgf family, helps patterning zebrafish embryos. Dev Biol. 2004;271:130–43. [DOI] [PubMed] [Google Scholar]
- 60.Moreau C, Caldarelli P, Rocancourt D, Roussel J, Denans N, Pourquie O, et al. Timed Collinear Activation of Hox Genes during Gastrulation Controls the Avian Forelimb Position. Curr Biol. 2019;29(35–50):e34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Durston AJ. Vertebrate hox temporal collinearity: does it exist and what is it’s function? Cell Cycle. 2019;18:523–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Iimura T, Denans N, Pourquié O. Establishment of Hox vertebral identities in the embryonic spine precursors. Hox Genes. 2009;88:201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Moffitt JR, Bambah-Mukku D, Eichhorn SW, Vaughn E, Shekhar K, Perez JD, et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science. 2018. 10.1126/science.aau5324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Thrane K, Eriksson H, Maaskola J, Hansson J, Lundeberg J. Spatially resolved transcriptomics enables dissection of genetic heterogeneity in stage III cutaneous malignant melanoma. Cancer Res. 2018;78:5970–9. [DOI] [PubMed] [Google Scholar]
- 65.Moncada R, Barkley D, Wagner F, Chiodin M, Devlin JC, Baron M, et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol. 2020;38:333–42. [DOI] [PubMed] [Google Scholar]
- 66.Wu H, Tian W, Tai X, Li X, Li Z, Shui J, et al. Identification and functional analysis of novel oncogene DDX60L in pancreatic ductal adenocarcinoma. BMC Genomics. 2021;22:833. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods. 2022;19:567–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Wang G, Zhao J, Yan Y, Wang Y, Wu AR, Yang C. Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks. Nat Mach Intell. 2023;5:1200–13. [Google Scholar]
- 69.Qiu X, Zhu DY, Yao J, Jing Z, Zuo L, Wang M, Min KH, Pan H, Wang S, Liao S, et al: Spateo: multidimensional spatiotemporal modeling of single-cell spatial transcriptomics. bioRxiv 2022:2022.2012.2007.519417.
- 70.Sakaguchi S, Mizuno S, Okochi Y, Tanegashima C, Nishimura O, Uemura T, et al. Single-cell transcriptome atlas of Drosophila gastrula 2.0. Cell Rep. 2023;42:112707. [DOI] [PubMed] [Google Scholar]
- 71.Kondo S, Miura T. Reaction-diffusion model as a framework for understanding biological pattern formation. Science. 2010;329:1616–20. [DOI] [PubMed] [Google Scholar]
- 72.Wan Y, El Kholtei J, Jenie I, Colomer-Rosell M, Liu J, Acedo JN, Du LY, Codina-Tobias M, Wang M, Sawh A, et al: Whole-embryo Spatial Transcriptomics at Subcellular Resolution from Gastrulation to Organogenesis. bioRxiv 2024.
- 73.Hao Y, Hao S, Andersen-Nissen E, Mauck WM 3rd, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(3573–3587):e3529. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Liao Y, Smyth GK, Shi W. Featurecounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–30. [DOI] [PubMed] [Google Scholar]
- 76.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chen Y, Lun AT, Smyth GK. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Res. 2016;5:1438. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.McCarthy DJ, Chen YS, Smyth GK. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic Acids Res. 2012;40:4288–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Shah G, Thierbach K, Schmid B, Waschke J, Reade A, Hlawitschka M, Roeder I, Scherf N, Huisken J: Multi-scale imaging and analysis identifies pan-embryo cell dynamics of germlayer formation in zebrafish. Image Data Resource; 2021. https://idr.openmicroscopy.org/webclient/?show=project-2152. [DOI] [PMC free article] [PubMed]
- 80.Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2:18–22. [Google Scholar]
- 81.Carvajal TM, Viacrusis KM, Hernandez LFT, Ho HT, Amalin DM, Watanabe K. Machine learning methods reveal the temporal pattern of dengue incidence using meteorological factors in metropolitan Manila, Philippines. BMC Infect Dis. 2018. 10.1186/s12879-018-3066-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Thisse B, Thisse C. In situ hybridization on whole-mount zebrafish embryos and young larvae. Methods Mol Biol. 2014;1211:53–67. [DOI] [PubMed] [Google Scholar]
- 83.Cheng T, Xing YY, Dong Y, Xu PF. Protocol for generation and assessment of head-like structure in zebrafish. STAR Protoc. 2023;4:102553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Dong Y, Cheng T, Liu X: Construction of Danio rerio Asymmetrical Maps (DreAM). Gene Expression Omnibus; 2024. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE262578.
- 85.Wang M, Hu Q, Lv T, Wang Y, Xiang R, Lan Q, Tu Z, Wei Y, Han K: High-resolution spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae. Spatial Transcript Omics DataBase; 2021. https://db.cngb.org/stomics/datasets/STDS0000060/. [DOI] [PubMed]
- 86.Liu C, Li R, Li Y, Lin X, Zhao K, Liu Q, Wang S, Yang X, Shi X, Ma Y, et al: ZESTA: Zebrafish Embryogenesis Spatiotemporal Transcriptomic Atlas. Spatial Transcript Omics DataBase; 2021. https://db.cngb.org/stomics/datasets/STDS0000057/.
- 87.Junker JP, Noël ES, Guryev V, Peterson KA, Shah G, Huisken J, McMahon AP, Berezikov E, Bakkers J, van Oudenaarden A: Genome-wide RNA tomography in the zebrafish embryo. Gene Expression Omnibus; 2014. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59873. [DOI] [PubMed]
- 88.Moncada R: Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Gene Expression Omnibus; 2020. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111672. [DOI] [PubMed]
- 89.Wu H, Yu J: Identification and Functional Analysis of Novel Oncogenes in Pancreatic Ductal Adenocarcinoma. Gene Expression Omnibus; 2021. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE171485. [DOI] [PMC free article] [PubMed]
- 90.Zhuang X, Dulac C: Molecular, Spatial and Functional Single-Cell Profiling of the Hypothalamic Preoptic Region. Gene Expression Omnibus; 2018. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE113576. [DOI] [PMC free article] [PubMed]
- 91.Liao J, Qian J, Hu Y, Lu X: De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution. Gene Expression Omnibus; 2022. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE192999. [DOI] [PMC free article] [PubMed]
- 92.Dong Y, Cheng T, Liu X, Fu XX, Hu Y, Yang XF, Yang LE, Li HR, Bian ZW, Jing N, et al: zebrafish SpatioTemperal Expression Profiles (zSTEP). GitHub; 2024. https://github.com/ldo2zju/zSTEP.
- 93.Dong Y, Cheng T, Liu X, Fu XX, Hu Y, Yang XF, Yang LE, Li HR, Bian ZW, Jing N, et al: Unravelling the progression of the zebrafish primary body axis with reconstructed spatiotemporal transcriptomics. Zenodo; 2024. https://zenodo.org/records/13952256. [DOI] [PMC free article] [PubMed]
- 94.Review Commons Report 1. Early Evidence Base. 2025. 10.15252/rc.2025880555.
- 95.Review Commons Report 2. Early Evidence Base. 2025. 10.15252/rc.2025459343.
- 96.Review Commons Report 3. Early Evidence Base. 2025. 10.15252/rc.2025867909.
- 97.Review Commons Authors Response. Early Evidence Base. 2025. 10.15252/rc.2025092690.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1: Fig. S1. Evaluation of Palette performance. Fig. S2. Cresyl violet staining of zebrafish serial cryosections. Fig. S3. Adjustment of Stereo-seq slices for 3D ST data construction. Fig. S4. Application of zSTEP for visualizing gene expression patterns and analysing spatial cell-cell communication. Figs. S5-S8. Wnt ligand, Wnt antagonist, Sonic hedgehog ligand, Notch ligand, FGF ligand and TGF-β ligand distributions along AP axis. Figs. S9-S10. Investigation of the key morphogens and key TFs for determining cell AP positions using the random forest model. Fig. S11. hox gene distributions along AP axis. Fig. S12. Inferring spatial gene expression patterns from human PDAC true bulk data. Fig. S13. Inferring spatial gene expression patterns from mouse hypothalamus true bulk data using MERFISH data as the ST reference. Fig. S14. Inferring spatial gene expression patterns from melanoma pseudo bulk data using Visium data as the ST reference. Fig. S15. Comparisons of alignment performance among PASTE, Spateo and manual alignment. Fig. S16. Selection of highly expressed, stable genes. Fig. S17. Assessing the effects of the number of detected genes and spatial resolution on Palette performance. Tab. S10. Computational configurations and processing speeds under varying data conditions.
Additional file 2: Tab. S1. DE genes in Zone1 and Zone2at 10 hpf. Tab. S2. Enriched GO terms in Zone1 at 10 hpf. Tab. S3. Enriched GO terms in Zone2 at 10 hpf. Tab. S4. DE genes in Zone1 and Zone2at 12 hpf. Tab. S5. Enriched GO terms in Zone1 at 12 hpf. Tab. S6. Enriched GO terms in Zone2 at 12 hpf. Tab. S7. DE genes in Zone1 and Zone2at 16 hpf. Tab. S8. Enriched GO terms in Zone1 at 16 hpf. Tab. S9. Enriched GO terms in Zone2 at 16 hpf.
Data Availability Statement
The raw data of serial bulk RNA-seq has been deposited to the Gene Expression Omnibus (GEO) under accession number “GSE262578” [84]. The published data used in this study can be accessed through the following links or accession number: (1) Stereo-seq data of Drosophila embryos (https://db.cngb.org/stomics/flysta3d/download/); [4, 85]; (2) Stereo-seq data of zebrafish embryos (https://db.cngb.org/stomics/zesta/download/); [2, 86]; (3) Serial bulk RNA-seq data of 6-hpf zebrafish embryos: “GSE59873” [1, 87]; (4) live imaging data of zebrafish embryos (idr0068 from https://idr.openmicroscopy.org) [24, 79]; (5) Spatial transcriptomics data of human PDAC: GEO accession: “GSE111672” [65, 88]; (6) Bulk RNA-seq data of human PDAC: GEO accession: “GSE171485” [66, 89]; (7) MERFISH data of mouse hypothalamus: GEO accession: “GSE113576” [63, 90]; (8) Bulk RNA-seq data of mouse hypothalamus region: GEO accession: “GSE192999” [21, 91]; (9) Visium data of melanoma tissues are available at https://www.spatialresearch.org/resources-published-datasets/doi-10-1158-0008-5472-can-18-0747/ [64].
The codes for Palette pipeline and bioinformatics analyses are deposited on GitHub (https://github.com/ldo2zju/zSTEP) under the GPL-3.0 license [92], and is also deposited to Zenodo with a DOI of https://zenodo.org/records/13952256 [93].






