Abstract
Obtaining single-cell spatial information remains a challenge in spatial transcriptomics. Here we develop SEU-TCA, a method that leverages transfer component analysis to improve single-cell spatial mapping accuracy. Application to multiple single-cell and spatial transcriptomic datasets shows superior performance in spatial deconvolution and cell mapping. Using SEU-TCA, we explore spatial gene expression and regulon activity during mouse gastrulation and identify anterior second heart field progenitors regulated by Irx1. Functional experiments reveal that Irx1 deletion disrupts anterior second heart field development and causes ventricular septal defects, underscoring SEU-TCA’s potential for advancing developmental biology research.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-025-03633-3.
Keywords: Single-cell RNA sequencing, Spatial transcriptomic, Cardiac progenitors, Regulatory factors
Background
Understanding the precise spatial positions of individual cells with transcriptomic signatures during early developmental stages is instrumental in bridging cellular functions and their spatial contributions with developmental processes. Recently, numerous single-cell transcriptomic atlases [1, 2] and spatial transcriptomic (ST) maps [3–5] have been independently reported to delve into the early developmental processes. Nevertheless, a key limitation of single-cell RNA-sequencing (scRNA-seq) analysis lies in its requirement for tissue dissociation, which inevitably leads to the loss of spatial position information. In contrast, current ST technologies typically capture in situ gene expression within spots containing multiple cells, inherently precluding the achievement of single-cell resolution. Therefore, there is an urgent need for computational methodologies that precisely predict the associations between scRNA-seq profiled “cells” and spatially resolved “spots” from ST data.
To date, numerous statistical approaches have emerged to integrate scRNA-seq datasets with ST data. These methodologies can be categorized into two primary groups: deconvolution methods and mapping methods, distinguished by their integration strategies. Deconvolution methods primarily disentangle the mixture of cells within each spatial spot, leveraging a reference scRNA-seq dataset [6–14]. Examples include cell2location [9] and CARD [10], which specialize in estimating the proportions of various cell types or states within each spot. Nevertheless, the precise spatial positioning of cells within the scRNA-seq dataset remains unaddressed by these methods. Conversely, mapping methods employ reference ST data to infer and assign spatial position information to individual cells within the scRNA-seq dataset [15–20]. Widely used methods in this category, including Tangram [15], SpaGE [19], and Seurat [18], construct mathematical models to integrate scRNA-seq dataset with ST data. Tangram, a deep-learning framework, correlates single-cell gene expression profiles with ST data, enabling accurate spatial mapping of scRNA-seq data from a reference ST dataset [15]. SpaGE aligns ST and scRNA-seq data through domain adaptation using PRECISE [19]. Seurat, while primarily focusing on batch correction among scRNA-seq datasets from different experiments, employs anchoring and transfer learning techniques to project single-cell data onto spatial coordinates [18]. Although significant progress has been made, these existing methods are still limited in their ability to simultaneously minimize distributional disparities between single-cell and spatial data, extract spatially relevant feature representations, and identify gene regulatory networks at the individual cell level within their respective spatial locations.
There is an urgent need to precisely map scRNA-seq single-cells to ST spots, as both spatial information and single-cell resolution are indispensable for elucidating the highly dynamic cellular interactions and developmental events. For example, the gastrula stage, in particular, represents a critical juncture in embryonic development where the three germ layers—endoderm, mesoderm, and ectoderm—are established, setting the foundation for subsequent organogenesis. However, currently, the finest resolution of gastrula ST data is at the scale of 20–40 cells, as demonstrated by studies such as Geo-seq, which cover embryonic stages E5.5 to E7.5 [4, 5]. This resolution is insufficient to reconstruct the intricate cell type distribution observed during this pivotal developmental period. On the other hand, while the single-cell transcriptomic atlas of mouse early gastrulation (scRNA-seq, E6.5-E8.5) has been systematically annotated [1], it lacks the crucial spatial context to fully understand cellular dynamics within the gastrula.
Here we introduced Spatial Expression Utility—Transfer Component Analysis (SEU-TCA), an integration approach leveraging TCA [21] to extract shared features in a shared latent space of scRNA-seq and ST data. By applying SEU-TCA to four distinct biological systems—mouse gastrulation, human heart, mouse olfactory bulb, and pancreatic ductal adenocarcinoma—we demonstrated its superior performance over existing methods in deconvolving the cellular composition of ST spots and predicting spatial locations for single cells from scRNA-seq data. To further verify the practical utility of SEU-TCA, we conducted an in-depth exploration of the results obtained from the E7.5 gastrulation ST maps and scRNA-seq datasets, extending our analysis to infer the spatial distribution of regulon activities for mesoderm. Building upon this analysis, we further dissected cardiac development, the earliest organogenesis program initiated during mesodermal development, at both the single-cell and spatial levels. We identified a series of spatially-specific transcription factors (TFs) and found that IRX1 specifically regulates the development of the anterior second heart field (aSHF). Genetic lineage tracing and gene knockout experiments revealed that Irx1-positive progenitors contributed to the aSHF lineage and its derivatives, and the deletion of Irx1 led to ventricular septal defects (VSDs) in mice. Taken together, these results demonstrate that SEU-TCA is capable of precisely deconvolving the spatial organization and dynamics of cells, providing a robust framework for integrating single-cell and spatial transcriptomics to study developmental processes.
Results
SEU-TCA: a TCA-based method for spatial mapping
To simultaneously decipher the spatial heterogeneity and cellular organization within complex tissues, we developed SEU-TCA to establish virtual connections between single cells and spatially resolved spots by applying the TCA approach [21]. The primary motivation of SEU-TCA is to identify the optimal nonlinear transformation (ϕ) that maps both reference data (XR, ST) and query (XQ, scRNA-seq) data into a shared latent space, where the Maximum Mean Discrepancy (MMD) between the latent representations of the reference (TCR) and query (TCQ) is minimal (Fig. 1; Methods). Pearson correlation coefficient (PCC) between TCR and TCQ is calculated to evaluate the spot-cell similarity. To extract meaningful insights, SEU-TCA can be further extended to incorporate downstream analysis focusing on three aspects: (1) spot deconvolution, aimed at resolving cell heterogeneity of spots; (2) inferring the spatial location of target cells to explore spatial heterogeneity of single cells; and (3) identifying spatial regulon to construct spatially informed gene regulatory networks at single-cell resolution.
Fig. 1.
TCA-based method for single-cell spatial mapping. Overview of methodology (pink area) and downstream application (blue area) of SEU-TCA. Methodology: Step 1) Given spatial transcriptome as reference (XR) and single-cell transcriptome as query (XQ), SEU-TCA applies TCA analysis and finds common latent representation (TCR and TCQ) of the two datasets; Step 2) PCC between TCR and TCQ is then computed to evaluate the similarity between spots in the spatial transcriptome and cells in the single-cell transcriptome, enabling spatial mapping. The output of SEU-TCA is shown using the example of E7.5 mouse embryo’s Geo-seq (spatial transcriptome, each spot contains 5–40 cells) and scRNA-seq. Downstream applications: 1) resolving cell heterogeneity of spots; 2) inferring spatial information of single-cells; and 3) identifying spatially variable genes
SEU-TCA demonstrates superior performance to the existing methods
We systematically evaluated SEU-TCA’s performance across four key dimensions: accuracy in simulation data, robustness to parameter variations, computational efficiency, and generalizability across diverse datasets.
Accuracy validation using simulation data
To generate benchmark data for accuracy evaluation, we applied a grid-based approach on a single-cell spatial transcriptomics dataset of human heart [22] (Fig. 2A; Methods). Specifically, the average expression of all cardiomyocytes (CMs) within each grid was used as the pseudo-bulk expression for each spot, while the dominant cell type within each grid was taken as the ground truth for each spot. We applied SEU-TCA to the pseudo-spot-level data and the corresponding single-cell CM data from the human heart dataset to predict cell types and spot composition. The performance was evaluated using multiple metrics, including PCC (to assess the consistency of expression levels between spot-cell pairs), Accuracy (ACC) and F1 score (to provide a balanced evaluation of prediction performance), Sensitivity and Specificity (to measure the method’s ability to correctly identify true positive and true negative cell types), and Adjusted Rand Index (ARI, to evaluate the similarity between predicted clustering and ground truth).
Fig. 2.
SEU-TCA demonstrates superior performance to existing methods. A The procedure for generating pseudo-bulk spots from the human heart dataset using a grid-based approach (Methods). The left panel displays the actual MERFISH cells in a frontal section of a developing human heart. The middle panel shows the pseudo-bulk spots created after gridding, with the cell type having the highest proportion within each spot taken as the ground truth. The right panel presents the single-cell data matched with the ST data, focusing exclusively on CMs for this analysis. The single-cell data were downsampled to 10,000 cells to create a manageable dataset for the subsequent analysis, ensuring sufficient coverage while maintaining computational efficiency. B Dominant cell types inferred using seven methods are shown, with corresponding PCC, ACC, and ARI metrics calculated by comparing predictions to the ground truth. PCC values are marked as NA for deconvolution methods, as these methods cannot obtain corresponding spot-cell pairs, making it impossible to compute the correlation of expression levels between spots from ST and cells from SC data. C- F Performance Summary: The evaluation across the four datasets was conducted by assigning scores to each method based on their rankings in three key metrics: PCC, ACC, and ARI. The scoring system assigned 7 points to the top-ranked method, 6 points to the second, and so forth, decreasing by 1 point per rank. These scores were aggregated to provide a comprehensive performance summary for each method across the human heart dataset (C), the mouse olfactory bulb dataset (D), the pancreatic ductal adenocarcinoma dataset (E), and the mouse gastrulation dataset (F)
We first compared SEU-TCA with two single-cell mapping methods (Tangram [15] and SpaGE [19]) and four deconvolution methods (CARD [10], cell2location [9], STRIDE [11], and CIBERSORTx [23]) on this human heart dataset. Here, the predictions by SEU-TCA were closely aligned with the ground truth, accurately capturing the spatial organization and boundaries of cell types (Fig. 2B). This alignment demonstrated the superior overall performance of SEU-TCA across the evaluation metrics (Fig. 2B, C; Additional file 1: Fig. S1). Among all compared methods, SEU-TCA shows the highest ARI value (0.64), followed by SpaGE (0.52), Tangram (0.49), cell2location (0.43), STRIDE (0.40), CARD (0.40), and CIBERSORTx (0.09) (Fig. 2B). We also observed that, except for CIBERSORTx, all methods were able to successfully infer dominant cell types with clear distinctions between the left and right atria (LA and RA). However, for the left and right ventricles (LV and RV), SEU-TCA achieved more precise predictions, particularly for ventricular cardiomyocyte (vCM)-RV-Trabecular and vCM-LV-Trabecular. Additionally, careful examination of the cell type composition for each spot further confirmed the accuracy of SEU-TCA deconvolution (Additional file 1: Fig. S1B). Similarly, SEU-TCA achieved strong performance (median PCC = 0.80), matching the performance of SpaGE (median PCC = 0.80), outperforming Tangram (median PCC = 0.73) by 10% (Fig. 2B; Additional file 1: Fig. S1 C). Other methods did not report PCC values because they are designed as spot deconvolution algorithms, providing the cell type composition for each spot rather than the relationships between spots and cells. These results highlight the great capability of SEU-TCA to reconstruct the ground truth cell type distribution with high precision and clearly distinguish critical anatomical regions.
Robustness to parameter variations
SEU-TCA offers three kernel types (Primal, Linear, RBF) for mapping data into the latent space, along with parameters such as the RBF kernel bandwidth (γ) to flexibly adapt to diverse data distributions. We also conducted extensive evaluations of kernel performance and parameter robustness using the human heart dataset. For the parameter alignment dimensions, we chose values ranging from 10 to 100 in increment of 10 under three kernel types (Primal, Linear, and RBF) and assessed their performance using ACC, F1 score, Sensitivity, and Specificity (Additional file 1: Fig. S2 A). The results demonstrated that dimensions above 30 had minimal impact on the performance, and the Primal kernel consistently outperformed both Linear and RBF kernels in this dataset. Additionally, we performed robustness check of the bandwidth parameter (γ) for the RBF kernel. Grid search analyses show minimal changes in performance with different γ values (Additional file 1: Fig. S2B). The abovementioned results suggest that SEU-TCA is robust to parameter variations across a wide range of settings.
Computational efficiency
The computational efficiency of SEU-TCA was evaluated on the human heart dataset. SEU-TCA achieved faster runtime compared to several other methods, substantially reducing the time required for analysis (Additional file 1: Fig. S3 A). Furthermore, SEU-TCA consumed ~ 1 Gb memory for this task with 10,000 single cells, demonstrating its efficiency in resource utilization (Additional file 1: Fig. S3B). These findings highlight the practical advantages of SEU-TCA and established it as a scalable solution for fine-map reconstruction in much larger datasets.
Generalizability across biological systems
To demonstrate SEU-TCA’s applicability on real ST data, we first applied it to the mouse olfactory bulb data [24], which clearly defined four main anatomic layers: the granule cell layer (GCL), the mitral cell layer (MCL), the glomerular layer (GL), and the nerve layer (ONL) (Additional file 1: Fig. S4 A). Single-cell data from 10x Chromium on the same tissue [25] was utilized for the analysis. The results inferred by SEU-TCA clearly revealed the expected layered structure of the mouse olfactory bulb, as shown in the visualizations of the dominant layers (Additional file 1: Fig. S4B). In comparison, although CARD and cell2location achieved relatively accurate layering, they exhibited less clarity in defining the boundary between the GL and ONL (Additional file 1: Fig. S4 C). Tangram produced blurred transitions between layers, and CIBERSORTx struggled to capture the distinct structural organization. Notably, SpaGE and STRIDE failed to differentiate the boundaries between most regions. Additionally, the evaluation metrics further confirmed the superior performance of SEU-TCA over other methods (Additional file 1: Fig. S4D-F). For example, it achieved the highest PCC, reflecting its strong ability to maintain consistent gene expression levels between spots and single cells. SEU-TCA also outperformed all other methods in terms of ACC, F1 score, and Sensitivity, while achieving comparable ARI and Specificity to CARD, demonstrating its remarkable stability and reliability. Overall, SEU-TCA provides the notably precise reconstruction of the mouse olfactory bulb’s layered architecture, particularly in distinguishing the ONL from adjacent layers.
To further explore its applicability in complex human pathological tissues, we then evaluated the performance of SEU-TCA on the pancreatic ductal adenocarcinoma dataset obtained from ST technology and its matched single-cell dataset generated via the inDrop platform [26]. The pancreatic ductal adenocarcinoma dataset consists of three regions with clear boundaries—ductal cells, acinar cells, and cancer—along with a stromal region, all labeled based on histological annotations (Additional file 1: Fig. S5 A). As expected, SEU-TCA, as well as CARD and STRIDE, was able to precisely delineate the complex architecture of the pancreatic ductal adenocarcinoma (Additional file 1: Fig. S5B-C). In contrast, Tangram and cell2location produced ambiguous transitions between cancer and acinar cells, making it difficult to discern the exact cellular composition. CIBERSORTx failed to identify specific cell types within the pancreatic ductal adenocarcinoma, while SpaGE had limited capability of differentiating the distinct regions. To quantify the performance of SEU-TCA over other methods, we calculated six key metrics: F1 score, ACC, ARI, Sensitivity, Specificity, and PCC for these methods on this pancreatic ductal adenocarcinoma data (Additional file 1: Fig. S5D-F). In terms of the F1 score (Additional file 1: Fig. S5D), SEU-TCA once again outperformed all competing approaches, underscoring its proficiency in providing highly accurate predictions. Furthermore, SEU-TCA ranked among the top three methods in terms of ACC, ARI, Sensitivity, and Specificity, reflecting its robustness and reliability in precisely identifying distinct cell types of the pancreatic ductal adenocarcinoma. For the PCC (Additional file 1: Fig. S5E), SEU-TCA achieved the highest value among three methods, demonstrating its unique ability to capture the intricate patterns of cell type distribution. In summary, the comprehensive evaluation of SEU-TCA across multiple metrics confirms its robust and reliable performance in analyzing the pancreatic ductal adenocarcinoma data, consistently surpassing or matching the performance of other existing methods.
To overcome the inadequacy of single-cell resolution gastrula ST data and the lack of spatial information in the gastrula scRNA-seq atlas, we next applied SEU-TCA to analyze Geo-seq and scRNA-seq data for E7.5 (0B-EB stage) mouse embryos, a critical developmental period characterized by gastrulation, during which the embryonic germ layers are formed, and the body axes are established [27]. SEU-TCA achieved high PCC values (mesoderm 0.84 ± 0.13; endoderm 0.85 ± 0.12; ectoderm 0.78 ± 0.13) and accurately recovered the original expression patterns (Additional file 1: Fig. S6). In comparisons with other representative single-cell mapping algorithms, such as Tangram [15] and SpaGE [19], SEU-TCA yielded more accurate estimation of the expression levels and the cell type proportions across spatial locations, underscoring the importance of feature alignment for effective data integration. Moreover, SpaGE, which also aligns features, outperformed Tangram but remained less accurate than SEU-TCA (Additional file 1: Fig. S7 A). In contrast, CIBERSORTx—dependent on predefined cell-type specificity—exhibited discrepancies in both spatial localization and cell type proportion estimation. (Additional file 1: Fig. S7B-D). Taken together, SEU-TCA demonstrates superior accuracy in integrating spatial and single-cell transcriptomics for the multiple datasets from distinct biological systems, consistently ranking among the top methods in terms of ACC, PCC, and ARI (Fig. 2C–F).
Overall, these results spotlight SEU-TCA’s outstanding accuracy, robustness, efficiency, and generalizability, as evidenced by its consistent excellence across various metrics. Through comprehensive evaluations on the human heart, the mouse olfactory bulb, the pancreatic ductal adenocarcinoma, and the mouse gastrulation datasets, SEU-TCA has proven its reliability, scalability, and practical benefits in reconstructing complex spatial and pathological tissue architectures.
SEU-TCA decodes spatial cell heterogeneity
SEU-TCA is a method capable of inferring spatial context for the single-cell transcriptomic landscape. To demonstrate its utility, we applied SEU-TCA to predict cellular locations of E7.5 mesodermal scRNA-seq using Geo-seq data as reference. Geo-seq spots were spatially divided into four zones, namely Proximal-Anterior zone (P-A), Proximal-Posterior zone (P-P), Distal-Anterior zone (D-A), and Distal-Posterior zone (D-P). Single cells with the same predicted zone identity were clustered together in the transcriptomic space, as evidenced by the space-independent UMAP (Fig. 3A and Additional file 1: Fig. S8 A). Moreover, the associated genes of the top two TCs exhibited regionalized expression patterns just along the P-D and A-P axes, respectively (Additional file 1: Fig. S8B). These results strongly support the association between spatial and molecular variations of the E7.5 mesodermal cellular landscape.
Fig. 3.
SEU-TCA decodes mesodermal cellular heterogeneity. A UMAP layout for the E7.5 mesodermal cells from Pijuan-Sala et al. is colored by four quadrants constructed by A-P and P-D axes (upper), predicted by SEU-TCA, and cell type (lower). The dashed lines represent cell clusters along the developmental axis. Embryonic axes: anterior–posterior, A-P; proximal–distal, P-D. NM, nascent mesoderm. MM, mixed mesoderm. IM, intermediate mesoderm. PhM, pharyngeal mesoderm. Mch, mesenchyme. PaM, paraxial mesoderm. SM, somitic mesoderm. EM, extraembryonic mesoderm. B Corn plots (top row, Geo-seq) and UMAP (bottom row, scRNA-seq) showing the mesodermal spatial expression patterns and single-cell expression levels of Krt8, Otx2, Cdx4 and Hes7 at E7.5, which marks four quadrants constructed by A-P and P-D axes. MA, anterior mesoderm; MP, posterior mesoderm. C Fraction of cell type predicted from E7.5 mesodermal cells for each Geo-seq spot, e.g. 10MA representing the 10th slice of anterior mesoderm. D DEGs between 6MA and 8MA for E7.5 scRNA-seq data are expressed at 7MA. E UMAP layout for 7MA cells inferred from E7.5 mesodermal Geo-seq is colored by cell type. The dashed line circles all 7MA cells in E7.5 mesodermal scRNA-seq. F Corn plots showing the spatial pattern of expression of 7MA markers and UMAP showing the single-cell resolution pattern of expression of 7MA markers at E7.5. G Fraction of inferred location for each E7.5 mesodermal cell type. Each cell type is arranged from top to bottom according to its proximity to the anterior spots and distance to the posterior spots. The varying shades of color at both ends of the bidirectional arrow represent the proximity to the anterior (red) and the posterior ends (blue). H Corn plot showing the spatial pattern of inferred contributions of NM at E7.5. I UMAP showing the results of NM cells re-clustered based on the spatial feature genes. The dashed line circles all NM cells in E7.5 mesodermal scRNA-seq. Heatmap visualizing the abundance of subclusters of NM in each spot in Geo-seq data. Rows represent subclusters, columns represent spatial locations. J Dot plot showing the key markers of subclusters of NM
We further identified markers for each spatial cluster, which exhibited highly regionalized and variable expression patterns in Geo-seq and scRNA-seq data, respectively (Fig. 3B). In total, we identified 123 genes with high region-specific expression, many of which have not been previously reported, while others have been reported as region-specific or driver genes underling embryonic pattern formation (Additional file 2: Table S1). For example, the D-A marker gene Krt8 is expressed in the extraembryonic mesoderm and plays a crucial role in the establishment of the A-P axis [28, 29]; Otx2, a V-A marker, is expressed in the anterior mesoderm (AM) [30–32]; Cdx4, a D-P marker, is expressed posteriorly and functions during embryonic axial formation [33, 34]; and Hes7, a V-P marker, is dynamically expressed in the pre-somitic mesoderm (PSM) during somitogenesis [35, 36].
By mapping single cells to ST spots representing mixtures of multiple cells, we were able to quantify the cell type composition of each spot. Along the caudal-rostral axis, following the order of 10MP-2MP-2MA-10MA (from the posterior-most end of the mesoderm to its anterior-most end), the distribution of cell types exhibited a discernible and continuous pattern. The transition of major cell types, excluding the less mature nascent mesoderm (NM) and mixed mesoderm (MM), formed the following order: extraembryonic mesoderm (EM)—intermediate mesoderm (IM)—somitic mesoderm (SM)—paraxial mesoderm (PaM)—pharyngeal mesoderm (PhM)—mesenchyme (Mch) (Fig. 3C). At most spots, the proportion of the major cell type was less than 50%, underscoring the necessity of deconvolution. In addition, we found that complex cell type compositions were more likely to occur at the junctions of multiple tissues. An example was the 7MA location, which contained 6 cell types, including relatively mature and predominant cell types such as PaM and PhM (Fig. 3C, E). Additionally, we found that 7MA simultaneously expressed marker genes associated with both 6MA and 8MA (Fig. 3D). In the 2-day-old chick embryo, PhM is primarily divided into two overlapping subdomains: the mesenchymal paraxial mesoderm (mPaM) and the medial splanchnic mesoderm (mScM) [37]. mPaM and mScM specifically express Pitx2 and Tcf21, respectively [37–40]. Interestingly, 7MA expresses both Pitx2 and Tcf21, whereas in scRNA-seq data, these two genes are specifically expressed in PaM and PhM cells, respectively (Fig. 3F). These findings suggest that the co-expression of PaM and PhM marker genes at 7MA results from the spatial overlap of the two cell types rather than the presence of intermediate cell states. Collectively, SEU-TCA provides valuable insights into cell type distribution, transitions, and spatial overlap.
SEU-TCA facilitates the identification of subtypes
We further systematically analyzed the spatial distribution of different cell types, the predicted positions of which were consistent with those reported in the literature [41] (Fig. 3G; Additional file 1: Fig. S9). Most cell types exhibited a short-strip-like distribution, while others, such as IM and NM, displayed a long-strip-like pattern. NM, the first mesodermal cell type originating from the primitive streak, was located across the A-P axis and near the distal side [42] (Fig. 3H). To further investigate the spatial heterogeneity of NM, we re-clustered NM cells using spatial feature genes, identifying three distinct spatial distribution clusters (Fig. 3I). We found that the cells in cluster 1 were exclusively localized at the anterior side (3MA-2MA) and specifically expressed Lhx1, Otx2, and Sfrp1 (Fig. 3J). The expression of Lhx1 in the epiblast marks the anterior mesendoderm. Together with Otx2 and Sfrp1, Lhx1 plays a crucial role in regulating its development [43]. In contrast, cells in cluster 2 were distributed along the A-P axis (2MA-2MP-3MP) and characterized by the expression of Fst, Epcam, Pim2, and Apln, which were expressed in neuromesodermal progenitors (NMP)-fated cells [44]. However, the spatial distribution of cluster 3 was broader (2MA-2MP-3MP-4MP-5MP-6MP-7MP), suggesting the possibility of multiple distinct developmental fates. Within this cluster, Hoxb1, Aldh1a2, and Foxf1 mark the posterior second heart field (pSHF) cells, while Msx1/2 and Cited1 mark the first heart field (FHF) cells, indicating that cluster 3 is likely a common progenitor for both FHF and pSHF [45]. These findings confirm that incorporating spatial information into single-cell data can further uncover spatially specific cell populations.
SEU-TCA enables construction of the spatial regulon map
To investigate the gene regulatory network underlying the spatially patterned mesodermal cell types, we set out to estimate the regulon activity at each ST spot. We first utilized the single-cell regulatory network inference and clustering (SCENIC) pipeline [46] to assess the regulon activity score (RAS) for individual mesodermal single-cells. We then generated a regulon activity heatmap, where single-cells were ordered by spatial locations and regulons were ordered using hierarchical clustering (Additional file 1: Fig. S10 A). Our analysis revealed distinct clusters of regulons characteristic of specific spatial spots or regions of consecutive spots. For example, Lhx1(+) represented a cluster of regulons with high RAS at 2MA and 3MA (Additional file 1: Fig. S10 A, B), consistent with its functions in governing anterior mesendoderm development [43, 47]. Another regulon cluster, represented by Hoxa10(+), showed specific and strong activity in EM located at 10MA and 10MP (Additional file 1: Fig. S10 A, B), consistent with its high enrichment in accessible chromatin of EM [48]. In contrast, a regulon cluster consisting of Gm10093(+) was specifically depleted in EM and exhibited high activity across the embryonic portion of mesoderm (Additional file 1: Fig. S10 A). Within the embryonic part, a cluster including Zfp467(+) was highly active in the distal part spanning from 2MA to 5MP (Additional file 1: Fig. S10 A). These findings suggest that mesodermal spatial pattern could be composed of transcriptional regulation along both embryonic-extraembryonic and rostral-caudal axes.
We further constructed a spatial regulon activity map for the mesoderm by averaging the RAS of single cells mapped to each spot and identified significantly enriched regulons in each spot (Fig. 4A; Additional file 3: Table S2). This allowed us to identify key gene regulatory networks functioning in a spatial location-dependent manner. Based on this spatial regulon map, we observed specific enrichment of the Hox family members at 10MA and 10MP (Fig. 4A), consistent with previous reports indicating that the Hox genes exhibit a temporally co-linear expression pattern during the differentiation of the posterior primitive streak, ultimately leading to EM formation [49]. In addition, we discovered that the Fox family members, which are crucial for embryonic development [50], exhibited highly specific regulon activity at 5MA and 6MA (Fig. 4A). For instance, Foxa2 has been reported to be expressed on the anterior side at E7.75 and is necessary for ventricular cell generation during cardiac development [51].
Fig. 4.
SEU-TCA enables construction of the spatial regulon maps. A Heatmap illustrates the spatial regulon map, with rows representing regulons and columns representing spatial locations. The redder the color, the higher the average RAS. B The top five regulons in 3MP are labeled on the plot. The y-axis displays the specificity score. C Corn plots showing the spatial pattern of regulon activity (left) and expression (right) of Sp5 in E7.5 mesodermal Geo-seq. D UMAP showing the single-cell resolution pattern of regulon activity (left) and expression (right) of Sp5 in E7.5 mesodermal scRNA-seq. E Corn plots showing the spatial pattern of regulon activity (left) and expression (right) of Tbx6 in E7.5 mesodermal Geo-seq. F UMAP showing the single-cell resolution pattern of regulon activity (left) and expression (right) of Tbx6 in E7.5 mesodermal scRNA-seq
After ranking regulons by the regulon specific score (RSS) for each spot, we identified candidate TFs driving location-specific cellular specification (Additional file 1: Fig. S11). For example, the 3MP spot, primarily composed of SM and the undifferentiated NM cells (Fig. 3C), exhibited high regulon activities of Sp5 and Tbx6 (Fig. 4B). The corresponding RAS and the TF expression was consistent at both spatial and transcriptomic levels (Fig. 4C–F). Previous studies have suggested that the expression of Sp5 is correlated with NM formation and is dynamically and restrictively expressed during somitogenesis [52, 53]. Furthermore, it has been shown that Tbx6 is highly expressed in the PSM and that loss of Tbx6 leads to severe defects in somitic development [44, 54, 55]. The spatial regulon map also revealed a large number of spot-specific regulons and TFs that had previously been unrecognized, which might be potential candidates for further functional validation.
SEU-TCA predicts cardiac progenitor cell locations
Cardiac development is a complex process involving multiple lineages [56], yet where and when progenitor cells segregate remain unclear [57]. In our previous study, we employed Waddington-Optimal-Transport (WOT) analysis [58], a differentiation trajectory inference algorithm, to uncover the transcriptional trajectories and epigenetic determinants that specify early cardiac lineages from mesoderm [59]. To systematically trace the locations of cardiac progenitors across developmental stages, we combined SEU-TCA with WOT analysis, to explore the spatial organization of differentiation trajectories. Specifically, we first identified single cells of each cardiac lineage in the E8.5 mouse embryo, focusing on Nkx2-5-positive CMs and CM-progenitors, including Mab21l2-positive Mch [60] and Isl1-positive PhM [61] (Additional file 1: Fig. S12 A). These cells were further classified into nine distinct clusters (Fig. 5A–C). Our results suggest the existence of three major progenitor clusters at E8.5: the juxta-cardiac field (JCF), aSHF, and pSHF. The JCF is defined by the expression of Mab21l2 [60] and Hand1 [62], while the pSHF exhibits enriched expression of markers including Osr1 [63] and Nr2f2 [64], reflecting its posterior spatial identity and lineage commitment. The aSHF, on the other hand, is characterized by genes such as Isl1 [61], Tbx1 [65], Fgf8 [66], and Tcf21 [38], which collectively underscore their pivotal roles in second heart field progenitor specification and outflow tract (OFT) development (Fig. 5C; Additional file 4: Table S3). To visualize their developmental process, we performed WOT analyses on these progenitors and generated time-series tSNE maps (Fig. 5D; Additional file 1: Fig. S12B). Notably, we observed that cells of these lineages formed distinct clusters as early as E7.0. However, the lineage-specific markers did not initiate expression until E7.75 (Additional file 1: Fig. S13). Our findings suggest that progenitor cells from distinct spatial domains may begin to exhibit subtle transcriptomic differences, indicative of early lineage bias, even before the activation of canonical marker genes.
Fig. 5.
Spatial facilitates prediction of cardiac progenitor cell locations. A Time-series tSNE layouts showing the dynamic differentiation process of mesodermal lineage from E7.0 to E8.5. Epi, epiblast. PS, primitive streak. Ant.PS, anterior primitive streak. Haem, haematoendothelial progenitors. PGC, primordial germ cell. Def.end, definitive endoderm. RN, rostral neurectoderm. SE, surface ectoderm. C.Epi, caudal epiblast. AL, allantois. B Subcluster with spatial characteristics in E8.5 cardiac progenitors and CMs. The Mab21l2-positive Mch cells, Isl1-positive PM cells, and Nkx2-5-positive CMs are re-clustered into three key cardiac lineages. Among them, JCF mainly contributes to LV and AVC; aSHF mainly contributes to OFT and RV; pSHF mainly contributes to atria and SV. JCF, juxta-cardiac field. aSHF, anterior second heart field. pSHF, posterior second heart field. OFT, out flow tract. RV, right ventricle. LV, left ventricle. AVC, atrioventricular canal. SV, sinus venosus. C Dot plot showing the key markers of subclusters of cardiac progenitors and CMs. D Backward-tracing identifies development trajectory of JCF (blue), aSHF (yellow) and pSHF (green) during E7.0-E8.5 using the WOT analysis. E8.5 JCF, aSHF and pSHF were used as trajectory endpoints. At each time point, all cells are preliminarily screened based on a WOT score greater than 0.0001. The highest score among the three lineage contributions is selected for each cell. E The top corn plots showing the spatial pattern of inferred progenitors of JCF (left), aSHF (middle), and pSHF (right) at E7.5. In each corn plot, each diamond’s color represents the weighted estimated cell type composition. The bottom corn plots showing the spatial pattern of expression of Ahnak (JCF progenitors), Tbx1 (aSHF progenitors),and Hoxb1 (pSHF progenitors) in E7.5 mesodermal Geo-seq
To explore whether the cardiac lineages are also spatially segregated at early stages, we performed SEU-TCA analysis to map E7.5 cardiac progenitor cells onto E7.5 Geo-seq ST atlas (Fig. 5E). The mapping results revealed highly distinct spatial localization of different lineage progenitors. The JCF progenitors exhibited focused localization at 10MA, the anterior boundary between the forming cardiac crescent and extraembryonic tissue. In contrast, progenitors of aSHF and pSHF were spatially distributed across the anterior and posterior sides of the mesoderm, respectively. The predicted locations were supported by the expression patterns of the spatial genes including Ahnak for JCF progenitors [67, 68], Tbx1 for aSHF progenitors [65], and Hoxb1 for pSHF progenitors [69, 70] (Fig. 5E). The spatial predictions were also consistent with the expression patterns of TFs involved in lineage specification (Additional file 1: Fig. S14). For example, Msx2 was specifically enriched in JCF (Additional file 1: Fig. S14 A), which has been identified in the gene regulatory network of JCF progenitors using epicardioids models [67]. Foxc2 [71] and Hoxb1 [69], which are early markers for aSHF and pSHF lineages, respectively, exhibited consistency in spatial pattern and expression pattern at aSHF and pSHF progenitors (Additional file 1: Fig. S14B, C). To date, this study represents the first systematic spatiotemporal tracing of early cardiac developmental trajectories.
SEC-TCA reveals the potential function of Irx1 in the aSHF lineage
Dysregulation or abnormalities in the development of the aSHF lineage have been associated with various congenital heart diseases (CHDs), including defects in the aorta, pulmonary artery, and ventricular septum [72, 73]. However, the subclusters and key regulatory factors of aSHF require further in-depth exploration. At E7.5, aSHF progenitors spatially spanned large areas of the anterior mesoderm (Fig. 5E), indicating the heterogeneity of aSHF. To spatially dissect the aSHF and identify functional regulators in aSHF development, we re-clustered the E7.5 aSHF progenitors based on the spatial feature genes derived from the ST atlas. This analysis resulted in three clusters, referred to as C1-C3 (Fig. 6A; Methods). After mapping the spatial locations of each cluster, we observed a consecutive distal-proximal order from C1 to C3, indicating a gradual transition of cell states along the P-D axis (Fig. 6B). Differentially expressed gene (DEG) analyses supported the distinction among the subclusters of aSHF (Fig. 6C). In the C1 subcluster, mesodermal TFs, such Eomes and T, were specifically expressed, suggesting they could be the aSHF progenitors just transitioning from mesoderm and lagging behind the rest of the subclusters, consistent with their posterior localization (Fig. 6B). Foxc2 [71], reported as an aSHF-specific TF gene, was enriched in C2 (Fig. 6C).
Fig. 6.
Irx1 plays a crucial role in the development of the aSHF lineage. A E7.5 aSHF progenitors inferred by WOT analysis are re-clustered using the spatial feature genes. UMAP layout for the E7.5 aSHF progenitors is colored by subclusters (top) and cell type (bottom). The dashed line circles all aSHF progenitors at E7.5. B Corn plots showing the spatial pattern of inferred contributions of C1-3 clusters at E7.5. Color represents the estimated cell density of each cell type. C Dot plot showing the key markers of subclusters of E7.5 aSHF progenitors. Genes belonging to the Iroquois homeobox (Irx) family are marked in red. These genes were first identified as top-ranking genes using the FindMarker function in Seurat, based on LogFC and adj.pval calculations, and then manually selected considering their biological relevance in the context of our study. D RNAscope analysis showing that Irx1 is highly specifically expressed in the anterior (A) region, particularly in layers 4/5/6. Scale bar: 100 µm. E Spatial positions, suggested by seqFISH data, of annotated single cells in the E8.5 heart section [77] (left) and normalized log expression counts of Irx1 and Nr2f1 (right). NC, neural crest
Interestingly, the Iroquois homeobox (Irx) family members Irx1, Irx3, and Irx5 were highly and specifically expressed in C2. Although the Irx family genes have been shown indispensable for heart development [74–76], their functions in early cardiac development remain unclear. Compared to Irx3 and Irx5, the spatial pattern of expression and regulon activity of Irx1 in the mesoderm exhibited higher spatial specificity, as suggested by Geo-seq data (Additional file 1: Fig. S15 A-C). RNAscope analyses also supported the specific expression of Irx1 in the anterior of the mesoderm at layers 4, 5, and 6 (4MA, 5MA, and 6MA) (Fig. 6D). In previously published seqFISH data of mouse embryos [77], we also observed that in E8.5 embryonic seqFISH sections, Irx1-positive cells, and Nr2f1-positive pSHF [64] cells converge into the CM from the anterior and posterior regions of SHF, respectively (Fig. 6E). Moreover, the activity of the Irx1 regulon was also enriched in area of C2, suggesting its function in controlling the specification of this aSHF subcluster (Additional file 1: Fig. S15B). Additionally, we found that the Irx1-positive subpopulation consistently exhibited high aSHF module scores (Additional file 1: Fig. S15 C-D), further highlighting its potential significance in driving key functional processes within the aSHF population.
Irx1 is required for the development of the aSHF lineage and its derivatives
To trace these Irx1-positive cells in cardiac development, we generated an Irx1-reporter mouse model in which the CreERT2-gapYFP or CreERT2-eGFP cassette was targeted to the endogenous Irx1 locus after the start codon (ATG) of Irx1 (Irx1-CreERT2-gapYFP or Irx1-CreERT2-eGFP). Irx1-CreERT2-gapYFP or Irx1-CreERT2-eGFP mice were bred to Rosa26-eYFP or Rosa26-tdTomato (Rosa26-tdT) mice, respectively, and the pregnant female mice were subjected to tamoxifen injection for Cre-ERT2 activation at 6.25 ~ 6.5 dpc (Additional file 1: Fig. S16 A-B). The Irx1-CreERT2; Rosa26-eYFP or Rosa26-tdT embryos were then collected at E8.75 and E9.5 for the analyses of eYFP-positive or tdT-positive cells, which represent derivatives of Irx1-positive aSHF progenitors. We then quantified the contribution of eYFP/tdT-positive cells of Irx1-CreERT2; Rosa26-eYFP or Irx1-CreERT2; Rosa26-tdT embryos in the LV, RV, OFT, and aSHF where cells have not migrated into the developing heart. Contribution of the Irx1-lineage is the highest in aSHF at E8.75 and is higher in OFT than in RV and LV at E8.75-E9.5 (Fig. 7A, D, and E), suggesting Irx1-positive cells contribute more to the aSHF where they migrate into the developing heart through the OFT, than the primary heart field contributing to the LV. At E8.75, eYFP-positive cells colocalized with MYL7-positive CMs [78] in the OFT and RV (Fig. 7B), as well as in the remaining SHF colocalized with ISL1 [61] (Fig. 7C), further supporting the contribution of Irx1-positive cells to the aSHF lineage.
Fig. 7.
Deletion of Irx1 in cardiac progenitors leads to ventricular septal defects in mice. A eYFP (green) and DAPI (blue) staining in embryonic heart sections at E8.75. Arrow heads indicate the contribution region of Irx1-positive cells at E8.75, including OFT and RV. B Immunofluorescence analysis showing the expression of MYL7 (ventricle marker) in the OFT/RV, atrioventricular canal (AVC), atrium (At), and endocardium (Endo). Arrowheads indicate areas where MYL7 and eYFP co-localize, highlighting the presence of Irx1-positive cells in these regions. C Immunofluorescence and lineage analysis revealing the co-localization of ISL1 (SHF marker) and eYFP in Irx1-positive cells. The arrowhead marks the specific cells expressing both Isl1 and Irx1, highlighting their spatial distribution and potential role in cardiac development. D tdT (red) and DAPI (blue) staining in embryonic heart sections at E9.5. Arrow heads indicate the contribution region of Irx1-positive cells at E9.5, including OFT and RV. E Quantification of the contribution of the eYFP/tdT-positive cells in different heart compartments in E8.75 Irx1-CreERT2; Rosa26-eYFP embryos and E9.5 Irx1-CreERT2; Rosa26-tdT embryos. n represents the total number of embryos, with n = 5 in LV, RV and OFT at E8.75 and E9.5; and n = 3 in aSHF at E8.75. P-values were calculated using the Wilcoxon rank-sum test. Error bars are SEM. *p < 0.05, **p < 0.01. F Representative images of embryos at E13.5: one Irx1f/f embryo on the left and three Mesp1-Cre; Irx1f/f embryos on the right, with the latter exhibiting varying severities of ventricular septal defects. Red arrows indicate the locations of the ventricular septal defect in these embryos. Scale bar, 500μm. G Stacked bar chart showing the number of VSDs in Irx1f/f, Mesp1-Cre; Irx1f/+ and Mesp1-Cre; Irx1f/f embryos at E13.5. n is the number of embryos. The sample size (n) is as follows: n=20 for Irx1f/fgroup, n=25 for Mesp1-Cre; Irx1f/+group and n=22 for Mesp1-Cre; Irx1f/fgroup. The number of embryos with or without VSD is directly labeled on the chart. P-values were calculated using the Chi-square test. *p < 0.05, ***p < 0.001.
To examine the potential role of Irx1 in the aSHF lineage, we generated the Irx1f/f mice and bred with Mesp1-Cre mice, which allowed for mesoderm-specific Irx1 deletion (Irx1 CKO) (Additional file 1: Fig. S16 C-D). We then quantified the proportion of E13.5 Irx1f/f, Mesp1-Cre; Irx1f/+, and Mesp1-Cre; Irx1f/f embryos exhibiting the VSD, characterized by the incomplete formation of the ventricular septum, with representative examples of Mesp1-Cre; Irx1f/f embryos with VSDs of varying degrees of severity shown in Fig. 7F. Our analysis indicates that approximately 45.45% of Mesp1-Cre; Irx1f/f embryos exhibit VSDs on E13.5, a proportion significantly higher than that observed in Mesp1-Cre; Irx1f/+ embryos, while no VSD is observed in Irx1f/f embryos (Fig. 7G). These results demonstrate that deletion of Irx1 in cardiac progenitors leads to high penetrance of the VSD.
These findings suggest that Irx1 plays a pivotal role in aSHF development, significantly contributing to the formation of its derivatives, although its potential influence on broader mesodermal populations cannot be excluded. In summary, SEU-TCA serves as a powerful tool for integrating spatial and transcriptomic data, providing a foundational framework for studying development and disease.
Discussion
Here, we introduce SEU-TCA, a novel computational framework specifically designed to integrate ST data with scRNA-seq data by utilizing TCA to obtain shared feature representations. Compared to other methods, SEU-TCA exhibits higher accuracy across multiple datasets, especially in spatial deconvolution and single-cell mapping tasks. By applying SEU-TCA, we have successfully constructed the dynamic differentiation process of early cardiac development at the single-cell level, identify three distinct cardiac progenitor lineages (JCF, aSHF, and pSHF), and resolve the spatial localization of these early cardiac progenitor cells. This demonstrates SEU-TCA’s great performance in exploring the spatial distribution of progenitor cells during complex developmental processes.
After identifying three distinct cardiac progenitor lineages, we delve deeper into the aSHF lineage and identify possible distinct differentiation states of aSHF progenitors as early as E7.5. Our spatial deconvolution analysis using SEU-TCA reveals highly detailed spatial expression patterns of the Irx family genes, Irx1, Irx3, and Irx5, within the aSHF lineage. This is consistent with a previous report showing that these genes exhibit high homology and overlapping expression patterns throughout various stages of mouse embryonic heart development [74]. In comparison to these prior experiments, our analyses suggest the co-expression of Irx1, Irx3, and Irx5 at an earlier time point and provide a more specific description of their co-expression within the aSHF lineage.
It has been well-documented that the double knockout of Irx3 and Irx5 results in abnormal orientation and arrangement of the OFT [79], yet the precise role of Irx1 in cardiac development remained enigmatic. Prior studies on Irx1 knockout mice have reported neonatal lethality primarily attributed to lung immaturity [80], and postnatal thinning of the compact layer of the ventricular wall [81], but these findings did not fully elucidate Irx1’s function in cardiac development. Building upon our SEU-TCA analysis, we have conducted genetic lineage tracing, which confirms that Irx1-positive cells contribute to aSHF’s derivatives. Furthermore, through targeted deletion of Irx1 in mesodermal cells using Mesp1-Cre, we have demonstrated that the absence of Irx1 in the mesoderm leads to the VSD. These findings shed new light on the pivotal role of Irx1 in cardiac development. Moreover, they underscore the ability of SEU-TCA to discern intricate spatial expression patterns that may have previously eluded detection due to limitations in previous methodologies, thereby enhancing our understanding of the complex developmental landscape.
The superior performance of SEU-TCA stems primarily from its innovative approach to integrating spatially resolved transcriptomic data with scRNA-seq data. The key lies in the alignment of features within a shared latent space, achieved through the utilization of TCA. This alignment not only enables high-quality deconvolution of spatially resolved data but also facilitates accurate spatial mapping of scRNA-seq data. While SEU-TCA shares the goal of feature alignment with methods like SpaGE [19], it distinguishes itself through its fundamentally different approach. SpaGE relies on the PRECISE method, which involves performing independent Principal Component Analysis (PCA) on each dataset based on shared genes, thereby aligning spatial and scRNA-seq data into a shared latent space. However, this independent PCA approach inherently assumes linear relationships, potentially overlooking complex, non-linear structures prevalent in developmental datasets. In contrast, SEU-TCA minimizes the MMD between spatial and scRNA-seq data to obtain matched shared latent feature representations. As a non-parametric method, MMD not only leverages the kernel trick to map data into a high-dimensional feature space, making distribution differences easier to measure and minimize, but also allows for the selection of different kernels (e.g., Linear, Primal, RBF). This flexibility enables SEU-TCA to be tailored to the specific characteristics of the data, particularly when dealing with complex and diverse data distributions.
While SEU-TCA demonstrates robust performance, certain limitations should be acknowledged. Data preprocessing steps, such as normalization and feature selection, may introduce biases that affect spatial mapping accuracy. For instance, the selection of highly variable genes, while capturing key transcriptional variability, may overlook less-studied genes with potential biological relevance. Additionally, parameter choices, such as the removal of cell-spot pairs with low PCCs (PCC < 0.7), improve robustness but might exclude biologically meaningful associations with higher noise levels. Furthermore, the method’s generalizability to datasets with different spatial resolutions, sequencing depths, or batch effects also requires further validation. Addressing these limitations in future studies will enhance the method’s utility and interpretability.
Emerging single-cell spatial transcriptomics technologies, such as MERFISH [82] and Stereo-seq [83], offer high-resolution insights but they are often constrained by lower gene detection rates or prohibitive costs, limiting their feasibility for large-scale studies. In contrast, SEU-TCA leverages widely available multi-cell spot-based data to provide an effective computational framework for spatial transcriptomics analysis. This highlights its importance in bridging the gap between scalability and resolution in current spatial transcriptomics research.
Conclusions
SEU-TCA presents a robust and versatile computational framework for the integrative analysis of single-cell and spatial transcriptomic data. SEU-TCA demonstrates superior performance across diverse tissues and developmental stages, underscoring its broad applicability. Applying SEU-TCA to early cardiac development identified IRX1 as a critical spatially regulated transcription factor in aSHF specification, with genetic analyses validating its essential role in proper heart morphogenesis. These findings demonstrate SEU-TCA’s capability to enhance spatial resolution in transcriptomic studies and reveal mechanistic insights into spatiotemporal gene regulation during embryogenesis.
Methods
Mice
All experiments involving animals were conducted in accordance with the NIH Guide for the Use and Care of Laboratory Animals. All animal protocols were approved by the Animal Care and Use Committee of Southeast University and the Institutional Animal Care and Use Committee (IACUC) of Tongji University. The mice were caged under SPF level conditions with 12 h light/dark cycles and given water and food and monitored daily for health.
Irx1-creERT2-gapRFP/ + mice were purchased from GemPharmatech (Nanjing, China). Irx1-creERT2-eGFP/ + mice were purchased from Cyagen Biosciences (Suzhou, China). Rosa26-eYFP/Rosa26-eYFP mice were originated from the Jackson Laboratory and were gifted from Pengfei Sui’s Lab in Center for Excellence in Molecular Cell Science. Rosa26-tdTomato/Rosa26-tdTomato mice were originated from Fengchao Wang’s Lab in Third Military Medical University (Chongqi, China). Irx1-creERT2-eGFP/ + or Irx1-creERT2-gapRFP/ + male mice were caged with Rosa26-tdTomato/Rosa26-tdTomato or Rosa26-eYFP/Rosa26-eYFP female mice, respectively, to generate Irx1-creERT2-eGFP/ + ; Rosa26-tdTomato/ + (Irx1-creERT2; Rosa26-tdT) and Irx1-creERT2-gapRFP/ + ; Rosa26-eYFP/ + (Irx1-creERT2; Rosa26-eYFP) mice. Pregnant females were identified with vaginal plugs in the following morning (E0.5) and were intraperitoneally injected with tamoxifen at a dose of 0.1 mg/g at 6.25 ~ 6.5 days post-coitum (d.p.c.).
Irx1f/+ mice were purchased from Cyagen Biosciences (Suzhou, China). Mesp1-Cre mice were originated from RIKEN Institute (RBRC01145) and were gifted from Zhongzhou Yang’s Lab in Nanjing University. Mesp1-Cre/ + and Irx1f/+ mice were mated to generate Mesp1-Cre/ + ; Irx1f/+ and Mesp1-Cre/ + ; Irx1f/f embryos.
Mouse genomic DNA was obtained from mouse tail using alkaline lysis method. Embryonic genomic DNA was obtained from the tail and yolk sac of mouse embryos using Mouse Direct PCR Kit (Bimake, B40013; YEASEN, 10185ES70). PCR reaction was used to identify the genotype of the mice. Primer sequences used in this study are listed in Additional file 5: Table S4.
RNAscope
RNAscope in situ hybridization of E7.5 mouse embryos was performed as previously reported [5]. Briefly, RNAscope analysis of Irx1 was performed using RNAscope® Multiplex Fluorescent Reagent Kit v2 (Advanced Cell Diagnostics, 323,100) using probes supplied by Advanced Cell Diagnostics: mm-Irx1.
Immunofluorescence
Pregnant mice were euthanized with CO2 and 8.75 and 9.5 d.p.c. embryos were dissected using a pair of precise forceps. Mouse embryos were fixed with 4% PFA for 2 h at room temperature, washed with PBS, and saturated in a 30% sucrose solution before embedded in OCT. The embedded embryos were sectioned at 10 µm with a cryotome. To perform IF, wash the sliced samples with PBS then soak in 0.3% TritonX-100 for 40 min to penetrate the membrane. After washing the slices with PBS, block the sliced samples with ReadyProbes 2.5% Normal Goat Serum (Thermo) for 1 h and incubate overnight with diluted primary antibody at 4 °C. Wash the sections with PBS and incubate the diluted secondary antibody at room temperature for 1–2 h. The embryo slices were washed again with PBS and were stained with DAPI at room temperature for 10 min, washed with PBS, sealed, and then imaged in a confocal microscope (LSM710, CarlZeiss).
Antibodies
Antibody against MYL7 (Santa Cruz, sc-365255) (IF for embryos:1:100) was purchased from Santa Cruz. Antibody against ISL1 (Santa Cruz, sc-390793) (IF for embryos: 1:100) was purchased from Santa Cruz. Goat anti-mouse IgG(H + L) Highly Cross-Adsorbed Secondary Antibody, Alexa Fluor™ Plus 647(A32728) were purchased from Thermo Fisher Scientific.
Hematoxylin and eosin (H&E) staining
Embryos at 13.5 d.p.c. were obtained using the same method mentioned above. Fresh embryos were fixed in 4% PFA overnight at 4 °C, dehydrated using alcohol and vitrified in dimethylbenzene. Samples were then embedded in paraffin, sectioned at 6 µm and stained with H&E (GP1031; Servicebio Biotechnology, Wuhan, China). H&E staining was conducted according to the manufacturer’s instruction. The stained slides were then imaged with a stereo microscope (Motic SMZ-171).
Overview of SEU-TCA
SEU-TCA primarily consists of two components: TCA-based feature alignment and spatial mapping.
TCA-based feature alignment
We employed TCA for feature alignment of scRNA-seq data from a spatially resolved transcriptomic data here. Specifically, TCA learns a good feature representation across domains via minimizing the discrepancy between two different distributions by mapping original data into the shared latent space.
In the SEU-TCA model, we denote XR as the reference spatial data with spatial position labels YR, and XQ as the single-cell query data missing spatial position labels YQ. The main goal of TCA is to find a fine transformation that can align XR with XQ in the new common latent feature space. From a statistical perspective, let and be the distributions of XR and XQ, respectively, we expect that .
Referring to domain adaption, we then utilized MMD between transformed reference spatial data and transformed scRNA-seq query data to measure the distance between these two distributions, the empirical estimate of the distance is written as:
Here is defined as the universal Reproducing Kernel Hilbert Space (RKHS). and are the number of cells/spots in the corresponding dataset. Therefore, by mapping the distance between the means of the two datasets into the RKHS, the distance between the two datasets distributions can be accurately estimated.
Then, SEU-TCA seeks to minimize this distance governed by the nonlinear mapping . To find a lower-dimensional representation, SEU-TCA uses a kernel function (Linear, Primal, RBF) to map the gene expression data into a low-dimensional space. Next, compute the Gram matrices and on reference, query, and cross-reference-query data, respectively, and combine them into a single kernel matrix :
Then construct the weighted matrix L, which satisfies if , else if , otherwise , and a centering matrix defined as:
where is the identity matrix, is a column vector of ones, and is the total number of cells and spots. The optimization process minimizes the MMD between the transformed reference and query datasets in a shared latent space. Mathematically, the optimization problem is:
where is kernel matrix computed using the selected kernel function (Primal, Linear, or RBF). is weight matrix capturing domain discrepancy. The centering matrix is used to ensure that the kernel matrix is centered. is regularization parameter. is transformation matrix to project the data into a lower-dimensional shared latent space.
The optimization problem is solved through generalized eigenvalue decomposition of the matrices and . The top eigenvectors derived from this decomposition form the transformation matrix , which is subsequently used to project the reference and query data into the aligned feature space:
Kernel function selection
The SEU-TCA implementation supports three kernel types, which determine how the data is mapped into the latent space:
Primal kernel: Directly uses the original data without mapping to a higher-dimensional space. Suitable for linearly separable data.
Linear kernel: Computes pairwise dot products to capture linear relationships.
RBF kernel: Maps data into a high-dimensional space to capture non-linear relationships. Its performance depends on the kernel bandwidth (γ).
The choice of kernel function significantly impacts the alignment quality. For example, the Primal kernel is computationally efficient but may not effectively align complex distributions. The RBF kernel is more flexible for non-linear distributions but requires tuning the parameter.
Spatial mapping
SEU-TCA utilizes the aligned TC representations to compute the PCC between the transformed spatial spots and the query cells. Specifically, for each query cell, the PCC is calculated with every spatial spot based on their respective TC profiles. This approach evaluates the linear relationship between the TC features of spatial spots and those of single-cell query data.
The spot-cell pair with the highest PCC value is determined to be the most likely spatial match for the query cell, indicating the closest similarity between their aligned TC profiles. To improve the reliability of the mapping and minimize noise, spot-cell pairs with PCC values below a predefined threshold (e.g., 0.7) are filtered out. This threshold acts as a quality control filter, ensuring that only mappings with sufficiently high statistical confidence are retained.
By adopting this approach, SEU-TCA effectively identifies high-confidence spatial mappings while minimizing the inclusion of spurious or ambiguous matches, thereby providing a reliable framework for integrating spatial transcriptomics and single-cell RNA-seq data.
Benchmark comparison among different methods
We compared SEU-TCA with two mapping methods (Tangram [15] and SpaGE [19]) and four deconvolution methods (CARD [10], cell2location [9], STRIDE [11], and CIBERSORTx [23]). For all methods, we followed the tutorials available on their corresponding GitHub repositories. Moreover, the default parameter settings provided by these tutorials were adopted to ensure the consistency and standardization of the analysis process.
Statistical analysis of performance metrics
To compare the performance of SEU-TCA with other methods, we conducted statistical analyses for the PCC metric and provided descriptive statistics for additional metrics (ACC, F1 score, Sensitivity, and Specificity).
For each method, the PCC was calculated to measure the consistency between mapped and true gene expression profiles. To demonstrate the uncertainty, we computed the mean PCC and its 95% confidence intervals (CIs) using standard methods based on the sample mean and variance. Pairwise comparisons between methods were conducted using t-tests. For each pair of methods, we calculated p-values and adjusted them for multiple testing using the False Discovery Rate (FDR).
All statistical analyses were conducted in R (version 4.1). Confidence intervals were calculated using standard error-based methods. Pairwise t-tests were conducted using the “t.test” function implemented in the stats package, and FDR-adjusted p-values were computed with the “p.adjust” function implemented in the stats package.
Pseudo-bulk reconstruction ST for the human heart dataset
To generate pseudo-bulk spots from single-cell MERFISH spatial data, we utilized the grid-based approach to aggregate cells into spatially defined pseudo-spots. The following steps outline the procedure: (1) Grid assignment: Each single cell was assigned to a specific grid based on its spatial coordinates. The grid size was set to 100 units, and the row and column indices of each cell’s corresponding grid were calculated. (2) Grid identification: A unique identifier was generated for each grid by combining the row and column indices. (3) Dominant cell type assignment: The cell type with the highest proportion within each grid was assigned as the ground truth label for the pseudo-spot. This process ensured that pseudo-bulk spots were accurately spatially aligned and correctly labeled with the dominant cell type, establishing a reliable ground truth for downstream analysis and benchmarking.
Clustering analysis of spatial spots
Geo-seq data was processed using the Scanpy library (v1.8.2) in Python. At first, PCA was employed to reduce the dimensionality of the dataset while preserving the biological information. Then we select the top 20 principal components for downstream analysis, for example, construction of the neighborhood graph of spots, which was used for the clustering analysis using the Leiden algorithm. The resolution parameter of Leiden algorithm was set to 1 to balance the granularity of the clusters.
Identification of spatial markers for three germ layers for the mouse gastrulation dataset
After spatial mapping by SEU-TCA, each cell in scRNA-seq data was assigned a spatial position label. We then identified DEGs for each spatial spot as markers using the Wilcoxon rank-sum test implemented by function “FindAllMarkers” in the R package Seurat. We selected top three markers for each spatial spot based on fold change, to visualize the consistency of single-cell and spatial data.
Spatial regulon analysis
Here, we applied SCENIC pipeline to infer TFs and the gene regulatory network (regulon) in our single-cell transcriptome data with spatial position labels. The procedure mainly comprises three steps: (1) gene co-expression network construction: we utilized the GRNBoost algorithm to construct a gene co-expression network from the pre-processed gene expression matrix with default parameter settings. (2) Regulons, defined as TFs and their corresponding predicted target genes, were identified by RcisTarget. This step involves scanning for enriched TF binding motifs within the co-expressed gene sets identified by GRNBoost. (3) We quantified the activity of each regulon for each cell in spatial spots with AUCell. It assigns an enrichment score to each regulon based on the expression levels of its target genes, allowing for the identification of active regulatory programs in individual spatial locations. The regulon activity for each spatial location was defined as the average activity of the corresponding cells within that spatial location.
Construction of the time-series tSNE maps for mesodermal lineage
The process of constructing a time-dependent tSNE map is divided into the following steps: (1) Input the data from two consecutive developmental stages, such as E6.75 and E7.0, using the former as reference and the latter as query. (2) Perform PCA on the reference and query data. (3) Use the “TSNE” function from the Python package openTSNE (v 1.0.1) to initialize and fit a tSNE model to the first 20 PCs of the reference data, with the parameters “initialization = pca”, “exaggeration = 4”, and “metric = cosine”. (4) Calculate the correlation between the reference and query data across the variable genes from the reference. Identify the top k-nearest neighbors (kNN) for each query data point and assign each query data point to the median position of its nearest neighbors in the reference tSNE space. (5) Set the positions of the query data in the reference tSNE space as the initialization for the “TSNE” function, and finally generate a tSNE plot for the query data that aligns with the reference data. By analogy, the data at each subsequent time point utilizes the tSNE space of the previous time point as a reference to construct a time-dependent tSNE map.
Cell type annotation for cardiac clusters
We selected the cardiac-associated cells at E8.5, including JCF progenitors (Mab21l2-positive Mch), SHF progenitors (Isl1-positive PhM), and mature CM (Nkx2-5-positive CM). We then applied the standard Seurat workflow to re-cluster these cardiac-associated cells and assigned cell type to each cluster according to its corresponding markers.
Tracing the JCF, aSHF and pSHF lineages
Each lineage shown in Fig. 5 was inferred from a predefined starting cell set using the WOT analysis, as implemented in the Python package wot (v1.0.8) [58]. Specifically, E8.5 JCF cells were utilized for tracing the JCF lineage, E8.5 aSHF cells for the aSHF lineage, and E8.5 pSHF cells for the pSHF lineage. To ensure the overall quality of the tracing, we only kept cells with WOT score greater than 0.0001 for each lineage. To further determine the lineage of each cell, we assigned it to the lineage corresponding to its highest WOT score. The WOT was utilized with the default parameters setting as in the Waddington-OT online tutorial (https://broadinstitute.github.io/wot/tutorial/).
Sub-clustering of aSHF progenitors
Here, we performed sub-clustering analysis for aSHF progenitors at E7.5, which was predicted by WOT. The procedure is as follows. First, the function “FindAllMarkers” in R package Seurat was applied to identify DEGs on aSHF progenitors with spatial position labels at E7.5. Next, we performed PCA with the function “RunPCA” implemented in R package Seurat, where the parameter feature is set to the union of the top 20 DEGs for each spatial location. At last, we performed clustering analysis with the function “FindNeighbors” and “FindClusters” implemented in R package Seurat on the top 30 principal components generated by PCA, where the parameter resolution was set to 1. Here, we re-clustered the E7.5 aSHF progenitors, based on the spatial feature genes derived from the ST atlas, and achieved five clusters. However, two clusters were characterized by SM markers and Mch markers, respectively, so we excluded them from further analyses
Supplementary Information
Additional file 1. Supplementary figures.
Additional file 2: Table S1. Highly region-specific genes for E7.5 mesodermal cells.
Additional file 3: Table S2. Average of RAS by each spot from spatial regulon map.
Additional file 4: Table S3. Top 30 marker genes for each of the three progenitor clusters (JCF, aSHF, and pSHF) at E8.5.
Additional file 5: Table S3. Primers used in the current study.
Additional file 6: Peer review history.
Acknowledgements
The authors are grateful to the Lin & Luo lab members for helpful discussion of this study. We thank Prof. Pengfei Sui and Prof. Fengchao Wang for providing the Rosa26-eYFP/Rosa26-eYFP and Rosa26-tdTomato/Rosa26-tdTomato mice, respectively. We thank Ms. Qingyun Pan for technical assistance.
Peer review information
Shila Ghazanfar and Wenjing She were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.
Authors’ contributions
C.L. and P.X. designed the research. J.H. and Y.Y. analyzed the data. R.J., H.C., Y.Z., X.J., and X.X. performed the Irx1 lineage tracing and CKO phenotype analysis. X.Y. and N.J. performed the Irx1 RNAscope experiment. Z.Y. designed the Irx1 lineage tracing mice. J.H., Y.Y., P.X., Z.L., and C.L. wrote the manuscript. C.L., P.X., K.W., and Z.L. supervised the project. All authors read and approved the final manuscript.
Funding
Studies in this manuscript were supported by funds provided by National Key R&D Program of China (2018YFA0800100 and 2018YFA0800101 to C.L.; 2018YFA0800103 to Z.L. and P.X.; 2018YFA0800104 to K.W.), the National Natural Science Foundation of China (32030017 to C.L.; 32100529 to P.X.; 32070823 to K.W.).
Data availability
The previously published scRNA-seq data from mouse gastrulation that were re-analyzed here are available under accession codes E-MTAB-6967 from ArrayExpress [1, 84]. The previously published Geo-seq data that were re-analyzed here are available under accession codes GSE120963 [4, 5, 85]. Data from the human heart dataset is available for download from the Dryad Digital Repository [22, 86]. Data from the mouse olfactory bulb dataset is available for download from the Spatial Transcriptomics Research website [24, 87]. Data from the pancreatic ductal adenocarcinoma dataset is available from GEO under accession code GSE111672 [26, 88]. All other data supporting the findings of this study are available from the corresponding author on reasonable request. The SEU-TCA algorithm and source code used in this study are available at GitHub [89] and archived at Zenodo [90]. The code is released under the MIT License. All immunofluorescence and hematoxylin and eosin (H&E) staining images used in this study have been deposited in Zenodo [91].
Declarations
Ethics approval and consent to participate
All animal experiments were approved by the Animal Care and Use Committee at Southeast University and the Institutional Animal Care and Use Committee (IACUC) of Tongji University, and performed in accordance with institutional guidelines.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jingjing He, Yi Yang, Rui Jiang and Yanying Zheng contributed equally to this work.
Contributor Information
Hailong Cao, Email: shuqu_1982@sina.com.
Zhuojuan Luo, Email: zjluo@seu.edu.cn.
Ke Wei, Email: kewei@tongji.edu.cn.
Peng Xie, Email: pengx@seu.edu.cn.
Chengqi Lin, Email: cqlin@seu.edu.cn.
References
- 1.Pijuan-Sala B, Griffiths JA, Guibentif C, Hiscock TW, Jawaid W, Calero-Nieto FJ, Mulas C, Ibarra-Soria X, Tyser RCV, Ho DLL, et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature. 2019;566:490–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Qiu C, Martin BK, Welsh IC, Daza RM, Le T-M, Huang X, Nichols EK, Taylor ML, Fulton O, O’Day DR, et al. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature. 2024;626:1084–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xiao Z, Cui L, Yuan Y, He N, Xie X, Lin S, Yang X, Zhang X, Shi P, Wei Z, et al. 3D reconstruction of a gastrulating human embryo. Cell. 2024;187:2855-2874.e2819. [DOI] [PubMed] [Google Scholar]
- 4.Peng G, Suo S, Cui G, Yu F, Wang R, Chen J, Chen S, Liu Z, Chen G, Qian Y, et al. Molecular architecture of lineage allocation and tissue organization in early mouse embryo. Nature. 2019;572:528–32. [DOI] [PubMed] [Google Scholar]
- 5.Wang R, Yang X, Chen J, Zhang L, Griffiths JA, Cui G, Chen Y, Qian Y, Peng G, Li J, et al. Time space and single-cell resolved tissue lineage trajectories and laterality of body plan at gastrulation. Nat Commun. 2023;14:5675. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, Irizarry RA. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol. 2022;40:517–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Dong R, Yuan GC. SpatialDWLS: accurate deconvolution of spatial transcriptomic data. Genome Biol. 2021;22:145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Elosua-Bayes M, Nieto P, Mereu E, Gut I, Heyn H. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 2021;49: e50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, Elmentaite R, Lomakin A, Kedlian V, Gayoso A, et al. Cell 2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 2022;40:661–71. [DOI] [PubMed] [Google Scholar]
- 10.Ma Y, Zhou X. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat Biotechnol. 2022;40:1349–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sun D, Liu Z, Li T, Wu Q, Wang C. STRIDE: accurately decomposing and integrating spatial transcriptomics using single-cell RNA sequencing. Nucleic Acids Res. 2022;50: e42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Qiu C, Cao J, Martin BK, Li T, Welsh IC, Srivatsan S, Huang X, Calderon D, Noble WS, Disteche CM, et al. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nat Genet. 2022;54:328–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vahid MR, Brown EL, Steen CB, Zhang W, Jeon HS, Kang M, Gentles AJ, Newman AM. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat Biotechnol. 2023;41:1543–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wan X, Xiao J, Tam SST, Cai M, Sugimura R, Wang Y, Wan X, Lin Z, Wu AR, Yang C. Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope. Nat Commun. 2023;14:7848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Biancalani T, Scalia G, Buffoni L, Avasthi R, Lu Z, Sanger A, Tokcan N, Vanderburg CR, Segerstolpe Å, Zhang M, et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat Methods. 2021;18:1352–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh PR, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16:1289–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Welch JD, Kozareva V, Ferreira A, Vanderburg C, Martin C, Macosko EZ. Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity. Cell. 2019;177:1873-1887.e1817. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM 3rd, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell. 2019;177:1888-1902.e1821. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Abdelaal T, Mourragui S, Mahfouz A, Reinders MJT. SpaGE: Spatial Gene Enhancement using scRNA-seq. Nucleic Acids Res. 2020;48: e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Moriel N, Senel E, Friedman N, Rajewsky N, Karaiskos N, Nitzan M. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat Protoc. 2021;16:4177–200. [DOI] [PubMed] [Google Scholar]
- 21.Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans Neural Netw. 2011;22:199–210. [DOI] [PubMed] [Google Scholar]
- 22.Farah EN, Hu RK, Kern C, Zhang Q, Lu T-Y, Ma Q, Tran S, Zhang B, Carlin D, Monell A, et al. Spatially organized cellular communities form the developing human heart. Nature. 2024;627:854–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Steen CB, Liu CL, Alizadeh AA, Newman AM. Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. Methods Mol Biol. 2020;2117:135–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, Giacomello S, Asp M, Westholm JO, Huss M, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353:78–82. [DOI] [PubMed] [Google Scholar]
- 25.Tepe B, Hill MC, Pekarek BT, Hunt PJ, Martin TJ, Martin JF, Arenkiel BR. Single-Cell RNA-Seq of Mouse Olfactory Bulb Reveals Cellular Heterogeneity and Activity-Dependent Molecular Census of Adult-Born Neurons. Cell Rep. 2018;25:2689-2703.e2683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Moncada R, Barkley D, Wagner F, Chiodin M, Devlin JC, Baron M, Hajdu CH, Simeone DM, Yanai I. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol. 2020;38:333–42. [DOI] [PubMed] [Google Scholar]
- 27.Bardot ES, Hadjantonakis A-K. Mouse gastrulation: Coordination of tissue patterning, specification and diversification of cell fate. Mech Dev. 2020;163: 103617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Despin-Guitard E, Quenec’Hdu R, Nahaboo W, Schwarz N, Leube RE, Chazaud C, Migeotte I. Regionally specific levels and patterns of keratin 8 expression in the mouse embryo visceral endoderm emerge upon anterior-posterior axis determination. Front Cell Dev Biol. 2022;10:1037041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Nahaboo W, Eski SE, Despin-Guitard E, Vermeersch M, Versaevel M, Saykali B, Monteyne D, Gabriele S, Magin TM, Schwarz N, et al. Keratin filaments mediate the expansion of extra-embryonic membranes in the post-gastrulation mouse embryo. Embo j. 2022;41: e108747. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kurokawa D, Takasaki N, Kiyonari H, Nakayama R, Kimura-Yoshida C, Matsuo I, Aizawa S. Regulation of Otx2 expression and its functions in mouse epiblast and anterior neuroectoderm. Development. 2004;131:3307–17. [DOI] [PubMed] [Google Scholar]
- 31.Engert S, Burtscher I, Liao WP, Dulev S, Schotta G, Lickert H. Wnt/β-catenin signalling regulates Sox17 expression and is essential for organizer and endoderm formation in the mouse. Development. 2013;140:3128–38. [DOI] [PubMed] [Google Scholar]
- 32.Fossat N, Le Greneur C, Béby F, Vincent S, Godement P, Chatelain G, Lamonerie T. A new GFP-tagged line reveals unexpected Otx2 protein localization in retinal photoreceptors. BMC Dev Biol. 2007;7:122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.van Nes J, de Graaff W, Lebrin F, Gerhard M, Beck F, Deschamps J. The Cdx4 mutation affects axial development and reveals an essential role of Cdx genes in the ontogenesis of the placental labyrinth in mice. Development. 2006;133:419–28. [DOI] [PubMed] [Google Scholar]
- 34.Davidson AJ, Ernst P, Wang Y, Dekens MPS, Kingsley PD, Palis J, Korsmeyer SJ, Daley GQ, Zon LI. cdx4 mutants fail to specify blood progenitors and can be rescued by multiple hox genes. Nature. 2003;425:300–6. [DOI] [PubMed] [Google Scholar]
- 35.Ferjentsik Z, Hayashi S, Dale JK, Bessho Y, Herreman A, De Strooper B, del Monte G, de la Pompa JL, Maroto M. Notch is a critical component of the mouse somitogenesis oscillator and is essential for the formation of the somites. PLoS Genet. 2009;5: e1000662. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Bessho Y, Sakata R, Komatsu S, Shiota K, Yamada S, Kageyama R. Dynamic expression and essential functions of Hes7 in somite segmentation. Genes Dev. 2001;15:2642–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Tzahor E, Evans SM. Pharyngeal mesoderm development during embryogenesis: implications for both heart and head myogenesis. Cardiovasc Res. 2011;91:196–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Harel I, Maezawa Y, Avraham R, Rinon A, Ma HY, Cross JW, Leviatan N, Hegesh J, Roy A, Jacob-Hirsch J, et al. Pharyngeal mesoderm regulatory network controls cardiac and head muscle morphogenesis. Proc Natl Acad Sci U S A. 2012;109:18839–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Shih HP, Gross MK, Kioussi C. Cranial muscle defects of Pitx2 mutants result from specification defects in the first branchial arch. Proc Natl Acad Sci U S A. 2007;104:5907–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Dong F, Sun X, Liu W, Ai D, Klysik E, Lu MF, Hadley J, Antoni L, Chen L, Baldini A, et al. Pitx2 promotes development of splanchnic mesoderm-derived branchiomeric muscle. Development. 2006;133:4891–9. [DOI] [PubMed] [Google Scholar]
- 41.Tam PPL, Behringer RR. Mouse gastrulation: the formation of a mammalian body plan. Mech Dev. 1997;68:3–25. [DOI] [PubMed] [Google Scholar]
- 42.Dominguez MH, Krup AL, Muncie JM, Bruneau BG. Graded mesoderm assembly governs cell fate and morphogenesis of the early mammalian heart. Cell. 2023;186:479-496.e423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Costello I, Nowotschin S, Sun X, Mould AW, Hadjantonakis AK, Bikoff EK, Robertson EJ. Lhx1 functions together with Otx2, Foxa2, and Ldb1 to govern anterior mesendoderm, node, and midline development. Genes Dev. 2015;29:2108–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Guibentif C, Griffiths JA, Imaz-Rosshandler I, Ghazanfar S, Nichols J, Wilson V, Göttgens B, Marioni JC. Diverse Routes toward Early Somites in the Mouse Embryo. Dev Cell. 2021;56:141-153.e146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.de Soysa TY, Ranade SS, Okawa S, Ravichandran S, Huang Y, Salunga HT, Schricker A, Del Sol A, Gifford CA, Srivastava D. Single-cell analysis of cardiogenesis reveals basis for organ-level developmental defects. Nature. 2019;572:120–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Van de Sande B, Flerin C, Davie K, De Waegeneer M, Hulselmans G, Aibar S, Seurinck R, Saelens W, Cannoodt R, Rouchon Q, et al. A scalable SCENIC workflow for single-cell gene regulatory network analysis. Nat Protoc. 2020;15:2247–76. [DOI] [PubMed] [Google Scholar]
- 47.Perea-Gómez A, Shawlot W, Sasaki H, Behringer RR, Ang S. HNF3beta and Lim1 interact in the visceral endoderm to regulate primitive streak formation and anterior-posterior polarity in the mouse embryo. Development. 1999;126:4499–511. [DOI] [PubMed] [Google Scholar]
- 48.Pham TXA, Panda A, Kagawa H, To SK, Ertekin C, Georgolopoulos G, van Knippenberg SSFA, Allsop RN, Bruneau A, Chui JS-H, et al: Modeling human extraembryonic mesoderm cells using naive pluripotent stem cells. Cell Stem Cell. 2022;29:1346-1365.e1310. [DOI] [PMC free article] [PubMed]
- 49.Deschamps J, van Nes J. Developmental regulation of the Hox genes during axial morphogenesis in the mouse. Development. 2005;132:2931–42. [DOI] [PubMed] [Google Scholar]
- 50.Hannenhalli S, Kaestner KH. The evolution of Fox genes and their role in development and disease. Nat Rev Genet. 2009;10:233–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Bardot E, Calderon D, Santoriello F, Han S, Cheung K, Jadhav B, Burtscher I, Artap S, Jain R, Epstein J, Lickert H, Gouon-Evans V, Sharp AJ, Dubois NC. Foxa2 identifies a cardiac progenitor population with ventricular differentiation potential. Nat Commun. 2017;8:14428. [DOI] [PMC free article] [PubMed]
- 52.Treichel D, Becker MB, Gruss P. The novel transcription factor gene Sp5 exhibits a dynamic and highly restricted expression pattern during mouse embryogenesis. Mech Dev. 2001;101:175–9. [DOI] [PubMed] [Google Scholar]
- 53.Harrison SM, Houzelstein D, Dunwoodie SL, Beddington RS. Sp5, a new member of the Sp1 family, is dynamically expressed during development and genetically interacts with Brachyury. Dev Biol. 2000;227:358–72. [DOI] [PubMed] [Google Scholar]
- 54.Concepcion D, Washkowitz AJ, DeSantis A, Ogea P, Yang JI, Douglas NC, Papaioannou VE. Cell lineage of timed cohorts of Tbx6-expressing cells in wild-type and Tbx6 mutant embryos. Biol Open. 2017;6:1065–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.White PH, Farkas DR, McFadden EE, Chapman DL. Defective somite patterning in mouse embryos with reduced levels of Tbx6. Development. 2003;130:1681–90. [DOI] [PubMed] [Google Scholar]
- 56.Kelly RG. The heart field transcriptional landscape at single-cell resolution. Dev Cell. 2023;58:257–66. [DOI] [PubMed] [Google Scholar]
- 57.Lescroart F, Wang X, Lin X, Swedlund B, Gargouri S, Sànchez-Dànes A, Moignard V, Dubois C, Paulissen C, Kinston S, et al. Defining the earliest step of cardiovascular lineage segregation by single-cell RNA-seq. Science. 2018;359:1177–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Schiebinger G, Shu J, Tabaka M, Cleary B, Subramanian V, Solomon A, Gould J, Liu S, Lin S, Berube P, et al. Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell. 2019;176:928-943.e922. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Xie P, Jiang X, He J, Pan Q, Yang X, Zheng Y, Fan W, Wu C, Zheng W, Fang K, et al. Epigenetic delineation of the earliest cardiac lineage segregation by single-cell multi-omics. eLife. 2024;13:RP98293. 10.7554/eLife.98293.1.
- 60.Tyser RCV, Ibarra-Soria X, McDole K, Arcot Jayaram S, Godwin J, van den Brand TAH, Miranda AMA, Scialdone A, Keller PJ, Marioni JC, Srinivas S. Characterization of a common progenitor pool of the epicardium and myocardium. Science. 2021;371(6533):eabb2986. [DOI] [PMC free article] [PubMed]
- 61.Cai CL, Liang X, Shi Y, Chu PH, Pfaff SL, Chen J, Evans S. Isl1 identifies a cardiac progenitor population that proliferates prior to differentiation and contributes a majority of cells to the heart. Dev Cell. 2003;5:877–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Zhang Q, Carlin D, Zhu F, Cattaneo P, Ideker T, Evans SM, Bloomekatz J, Chi NC. Unveiling Complexity and Multipotentiality of Early Heart Fields. Circ Res. 2021;129:474–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhou L, Liu J, Olson P, Zhang K, Wynne J, Xie L. Tbx5 and Osr1 interact to regulate posterior second heart field cell cycle progression for cardiac septation. J Mol Cell Cardiol. 2015;85:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Devalla HD, Schwach V, Ford JW, Milnes JT, El‐Haou S, Jackson C, Gkatzis K, Elliott DA, Chuva de Sousa Lopes SM, Mummery CL, et al: Atrial‐like cardiomyocytes from human pluripotent stem cells are a robust preclinical model for assessing atrial‐selective pharmacology. EMBO Molecular Medicine. 2015;7:394-410-410. [DOI] [PMC free article] [PubMed]
- 65.Xu H, Morishima M, Wylie JN, Schwartz RJ, Bruneau BG, Lindsay EA, Baldini A. Tbx1 has a dual role in the morphogenesis of the cardiac outflow tract. Development. 2004;131:3217–27. [DOI] [PubMed] [Google Scholar]
- 66.Ilagan R, Abu-Issa R, Brown D, Yang YP, Jiao K, Schwartz RJ, Klingensmith J, Meyers EN. Fgf8 is required for anterior heart field development. Development. 2006;133:2435–45. [DOI] [PubMed] [Google Scholar]
- 67.Meier AB, Zawada D, De Angelis MT, Martens LD, Santamaria G, Zengerle S, Nowak-Imialek M, Kornherr J, Zhang F, Tian Q, et al. Epicardioid single-cell genomics uncovers principles of human epicardium biology in heart development and disease. Nat Biotechnol. 2023;41:1787–800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Zawada D, Kornherr J, Meier AB, Santamaria G, Dorn T, Nowak-Imialek M, Ortmann D, Zhang F, Lachmann M, Dreßen M, et al. Retinoic acid signaling modulation guides in vitro specification of human heart field-specific progenitor pools. Nat Commun. 2023;14:1722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Stefanovic S, Laforest B, Desvignes JP, Lescroart F, Argiro L, Maurel-Zaffran C, Salgado D, Plaindoux E, De Bono C, Pazur K, Théveniau-Ruissy M, Béroud C, Puceat M, Gavalas A, Kelly RG, Zaffran S. Hox-dependent coordination of mouse cardiac progenitor cell patterning and differentiation. Elife. 2020;9:e55124. [DOI] [PMC free article] [PubMed]
- 70.Bertrand N, Roux M, Ryckebüsch L, Niederreither K, Dollé P, Moon A, Capecchi M, Zaffran S. Hox genes define distinct progenitor sub-domains within the second heart field. Dev Biol. 2011;353:266–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Seo S, Kume T. Forkhead transcription factors, Foxc1 and Foxc2, are required for the morphogenesis of the cardiac outflow tract. Dev Biol. 2006;296:421–36. [DOI] [PubMed] [Google Scholar]
- 72.Yamagishi H. Clinical Developmental Cardiology for Understanding Etiology of Congenital Heart Disease. J Clin Med. 2022;11(9):2381. [DOI] [PMC free article] [PubMed]
- 73.Bruneau BG. The developmental genetics of congenital heart disease. Nature. 2008;451:943–8. [DOI] [PubMed] [Google Scholar]
- 74.Christoffels VM, Keijser AG, Houweling AC, Clout DE, Moorman AF. Patterning the embryonic heart: identification of five mouse Iroquois homeobox genes in the developing heart. Dev Biol. 2000;224:263–74. [DOI] [PubMed] [Google Scholar]
- 75.Houweling AC, Dildrop R, Peters T, Mummenhoff J, Moorman AF, Rüther U, Christoffels VM. Gene and cluster-specific expression of the Iroquois family members during mouse development. Mech Dev. 2001;107:169–74. [DOI] [PubMed] [Google Scholar]
- 76.Kim KH, Rosen A, Bruneau BG, Hui CC, Backx PH. Iroquois homeodomain transcription factors in heart development and function. Circ Res. 2012;110:1513–24. [DOI] [PubMed] [Google Scholar]
- 77.Lohoff T, Ghazanfar S, Missarova A, Koulena N, Pierson N, Griffiths JA, Bardot ES, Eng CHL, Tyser RCV, Argelaguet R, et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat Biotechnol. 2022;40:74–85. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Yelon D, Horne SA, Stainier DYR. Restricted Expression of Cardiac Myosin Genes Reveals Regulated Aspects of Heart Tube Assembly in Zebrafish. Dev Biol. 1999;214:23–37. [DOI] [PubMed] [Google Scholar]
- 79.Gaborit N, Sakuma R, Wylie JN, Kim KH, Zhang SS, Hui CC, Bruneau BG. Cooperative and antagonistic roles for Irx3 and Irx5 in cardiac morphogenesis and postnatal physiology. Development. 2012;139:4007–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Yu W, Li X, Eliason S, Romero-Bustillos M, Ries RJ, Cao H, Amendt BA. Irx1 regulates dental outer enamel epithelial and lung alveolar type II epithelial differentiation. Dev Biol. 2017;429:44–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Sheybani-Deloui S, Xu L, Hu L, Yuan Q, Son JE, Kim K-H, Liu W, Mo R, Zhang X, Chi L, et al: Irx1 and Irx2 play dose-dependent cooperative functions in mammalian development. 2022.
- 82.Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. RNA imaging Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348:aaa6090. [DOI] [PMC free article] [PubMed]
- 83.Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, Qiu X, Yang J, Xu J, Hao S, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;185:1777-1792.e1721. [DOI] [PubMed] [Google Scholar]
- 84.Pijuan-Sala B, Griffiths J. Timecourse single-cell RNAseq of whole mouse embryos harvested between days 6.5 and 8.5 of development. ArrayExpress; https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6967 (2018).
- 85.Peng G, Suo S, Cui G. Molecular architecture of lineage allocation and tissue organization in early mouse embryo. Gene Expression Omnibus; 2019. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120963. [DOI] [PubMed]
- 86.Farah E. Spatially organized cellular communities form the developing human heart. Dryad Digital Repository; 2023. 10.5061/dryad.w0vt4b8vp. [DOI] [PMC free article] [PubMed]
- 87.Salmén F, Fernandez J: Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Spatial Transcriptomics Research; 2016. http://www.spatialtranscriptomicsresearch.org . [DOI] [PubMed]
- 88.Moncada R, Barkley D, Wagner F. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Gene Expression Omnibus; 2020. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE111672. [DOI] [PubMed]
- 89.He J. Integration of single-cell and spatial transcriptomics by SEU-TCA reveals the spatial origin of early cardiac progenitors. GitHub; 2025. https://github.com/LinluoLab/SEU-TCA.
- 90.He J. Integration of single-cell and spatial transcriptomics by SEU-TCA reveals the spatial origin of early cardiac progenitors. 2025. Zenodo. 10.5281/zenodo.14616529.
- 91.He J. Immunofluorescence and hematoxylin and eosin (H&E) staining images for SEU-TCA. 2025. Zenodo. 10.5281/zenodo.15497630.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Additional file 1. Supplementary figures.
Additional file 2: Table S1. Highly region-specific genes for E7.5 mesodermal cells.
Additional file 3: Table S2. Average of RAS by each spot from spatial regulon map.
Additional file 4: Table S3. Top 30 marker genes for each of the three progenitor clusters (JCF, aSHF, and pSHF) at E8.5.
Additional file 5: Table S3. Primers used in the current study.
Additional file 6: Peer review history.
Data Availability Statement
The previously published scRNA-seq data from mouse gastrulation that were re-analyzed here are available under accession codes E-MTAB-6967 from ArrayExpress [1, 84]. The previously published Geo-seq data that were re-analyzed here are available under accession codes GSE120963 [4, 5, 85]. Data from the human heart dataset is available for download from the Dryad Digital Repository [22, 86]. Data from the mouse olfactory bulb dataset is available for download from the Spatial Transcriptomics Research website [24, 87]. Data from the pancreatic ductal adenocarcinoma dataset is available from GEO under accession code GSE111672 [26, 88]. All other data supporting the findings of this study are available from the corresponding author on reasonable request. The SEU-TCA algorithm and source code used in this study are available at GitHub [89] and archived at Zenodo [90]. The code is released under the MIT License. All immunofluorescence and hematoxylin and eosin (H&E) staining images used in this study have been deposited in Zenodo [91].







