Abstract
Background
Cell type-specific transcriptional heterogeneity in embryonic mouse skin is well-documented, but few studies have investigated the regulatory mechanisms. Here, we present high-throughput single-cell chromatin accessibility and transcriptome sequencing (HT-scCAT-seq), a method that simultaneously profiles transcriptome and chromatin accessibility. We utilized HT-scCAT-seq to dissect the gene regulatory mechanism governing epidermal stratification, periderm terminal differentiation, and fibroblast specification.
Results
By linking chromatin accessibility to gene expression, we identify candidate cis-regulatory elements (cCREs) and their target genes which are crucial for dermal and epidermal development. We describe cells with similar gene expression profiles that exhibit distinct chromatin accessibility statuses during periderm terminal differentiation. Finally, we characterize the underlying lineage-determining transcription factors and demonstrate that ALX4 and RUNX2 are candidate transcription factors regulators of the dermal papilla lineage development through in silico perturbation analysis and CUT&Tag experiment.
Conclusions
Overall, HT-scCAT-seq represents a powerful tool for unraveling the spatiotemporal dynamics of gene regulation in single cells. Our results advance the understanding of embryonic skin development while providing a scalable framework for investigating regulatory mechanisms across diverse biological systems and disease contexts.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-025-03652-0.
Keywords: HT-scCAT-seq, Cis-regulatory elements, Embryonic skin development
Background
Embryonic mouse skin serves as an effective barrier, protecting the developing epidermis and dermis from environmental insults and amniotic fluid. Skin development starts post-gastrulation and experiences significant morphological changes before birth [1]. By embryonic day 9.5 (E9.5), a single layer of epidermal basal layer cells forms. Epidermis stratification and dermis maturation are primarily completed by E18.5, giving rise to multiple cell lineages [2, 3]. Current research mainly focuses on the epidermis of mouse skin [4, 5]. While the maturation of both dermis and epidermis involves diverse cell destinies, the intricate connections between cell lineage differentiation and gene regulatory networks (GRNs) in the epidermis and dermis still remained elusive.
The transcriptome state indicates gene activation or repression, while epigenomic landscapes reveal potential cell type-specific CREs [6]. Advances in single-cell technologies have facilitated the exploration of factors determining cell identity within a tissue or organ. Single-cell RNA sequencing (scRNA-seq) is utilized to unveil the transcriptional dynamics during development of mouse skin and formation of the dermis [7–10]. The epidermis originates from surface ectoderm around E9.5 with a temporary cellular layer called periderm on the top of it. Then, epidermis proceeds to stratify, and basal epidermal cells undergo differentiation. This periderm exists only for a brief period and sloughed off around E18.5 [11]. Additionally, around E14.5, the hair follicle placodes in the epidermal appendages emit signals to the dermal fibroblasts, leading to the formation of dermal condensates (DC), which are the precursor cells of the dermal papillae (DP). WNT and BMP signals within the dermal papillae trigger differentiation and maturation of hair follicles [12, 13]. While previous studies used scRNA-seq to reveal cellular heterogeneity in periderm and DP, but few is known about the epigenetic mechanism driving the specification of dermis [14, 15].
Lineage-priming of chromatin accessibility can indicate gene expression and predict lineage selection before commitment [16]. Parallel profiling transcriptome and chromatin accessibility within the same single cell can offer a global view of biological state rather than single-omics alone. However, current single-cell multiomics technologies are limited in throughput, sensitivity, and cost inefficient [17–19]. Here, we present HT-scCAT-seq: High Throughput single-cell Chromatin Accessibility and Transcriptome sequencing, which enables simultaneous detection of chromatin accessibility and gene expression in thousands of single cells facilitated by a microfluidic device.
To systematically investigate cell type-specific gene regulatory programs in developing embryonic skin especially fibroblast within dermis, we applied HT-scCAT-seq to mouse embryonic skin. We profiled transcriptome and chromatin accessibility simultaneously in cells isolated from dorsal skin at E13.5, E14.5, E15.5, E16.5, and E18.5 stages. Our study reveals the dynamic heterogeneity within periderm and fibroblast populations during skin development. We identified crucial gene regulatory networks and characterized the underlying lineage-determining TFs in fibroblast lineage. Taken together, our work provides a comprehensive analysis of periderm and fibroblast population and offers insights into epidermal and dermal development during embryogenesis.
Results
HT-scCAT-seq enables accurate joint-profiling of chromatin accessibility and gene expression in single cells
In this study, we introduce HT-scCAT-seq, an enhancement of a previously published multiomics approach, scCAT-seq [17]. HT-scCAT-seq enables robust detection of ATAC and RNA in fixed cells using microfluidics-based DNBelab C4 platform [20]. The HT-scCAT-seq process involves five key steps: (I) nuclei are extracted and permeabilized using a mild lysis buffer and fixed with a low concentration of formaldehyde; (II) open chromatin regions are tagged by transposase Tn5; (III) mRNA transcripts are captured by a primer carrying poly (dT) and unique molecular identifiers (UMIs) during reverse transcription; (IV) nuclei, beads, and nuclei lysis buffer are encapsulated into emulsion droplets facilitated by DNBelab C4 platform. The transposed chromatin fragments and RT products of the same single cell are barcoded by bead-linked single-strand oligonucleotides and pre-amplified in droplets simultaneously; (V) the resulting product (RNA and ATAC partition) is separated by streptavidin beads followed by amplification with distinct primers to produce parallel libraries (Fig. 1a, Additional file 1: Fig. S1a and Additional file 2: Table S1). The advantages of this approach include increased efficiency and specificity in capturing target molecules, as the biotin-streptavidin interaction enhances the recovery of RNA while minimizing non-specific binding. These steps improve the overall sensitivity of our assays [16, 21]. For data processing, FASTQ libraries are de-multiplexed, then aligned to genomes. ATAC partition libraries perform beads calling and merging separately [22, 23]. Valid barcodes are transferred to RNA partition to perform bead merging, and RNA uses PISA [24] to generate gene matrices. Finally, the peak matrix is generated by peak calling. Then, peak and gene matrices are combined into an integrated matrix containing chromatin accessibility and gene expression information (Fig. 1a).
Fig. 1.
High-throughput single-cell chromatin accessibility and transcriptome sequencing based DNBelab C4 platform. a Scheme of the HT-scCAT-seq workflow and integrated data analysis. b Scatter plot of mixed-species experiment showing cells with both ATAC and RNA profiles obtained. Dots are colored based on species. c, d Track view displaying ATAC (top) and RNA (bottom) signals at a representative locus. c chr13: 91,340,000–91,370,000 for HEK293T cell. d chr11:69,090,000–69,130,000 for NIH3T3 cells. The middle panel shows accessible chromatin fragments across top 300 selected single cells. Box plots showing the distribution of detected gene number (e), UMIs (f), and FRiP (unique fragments in peaks) (g), across different methods performed with HEK293T and NIH3T3 cells. h Heatmap showing Spearman correlation between transcriptome profiles generated by different approaches
To evaluate its technical performance and data quality, we applied HT-scCAT-seq to an equal mixture of human HEK293T and mouse NIH3T3 cell lines. In this species mixing experiment, human and mouse reads were well-separated, with an observed collision rate of 2% and overall collision rate (including human-human or mouse-mouse mixing) of 4% in both the ATAC and RNA partitions [25]. This rate includes both cross-species and within-species doublets (Fig. 1b). After filtering the doublets, 1952 HEK293T cells and 2069 NIH3T3 cells remained for analysis. We obtained a median of 3679 RNA genes and 11,556 unique ATAC fragments for each cell (Additional file 2: Table S2). In HEK293T cells, reads from the ATAC and RNA library were highly enriched in the regions of Mir17hg gene loci. In NIH3T3 cells, reads from the ATAC and RNA library were highly enriched in the regions of Vamp2, Per1, and Hes7 gene loci (Fig. 1c and d). To evaluate the ability of HT-scCAT-seq to capture chromatin accessibility, we compared the insert size distribution of unique ATAC fragments and mapped reads around transcription start sites (TSSs) from two biological replicates (Additional file 1: Fig. S1b and c). As expected, reads from the ATAC partitions exhibited the expected periodical nucleosome pattern and a high enrichment around TSSs. For the ATAC and RNA partition, the ensemble signals revealed reproducibility (Pearson correlation coefficient > 0.99) between two biological replicates (Additional file 1: Fig. S1d–f).
When examining key metrics of ATAC libraries (unique fragments) and RNA libraries (UMIs and detected genes), we found that HT-scCAT-seq performed well in both modalities. This HT-scCAT-seq method yields data as high-quality as 11,396 ~ 25,487 unique fragments (ATAC), 6133 ~ 8430 UMIs, and 2831 ~ 3816 detected expressed genes (RNA) per cell (Additional file 2: Table S2). We also benchmarked HT-scCAT-seq alongside other single-cell multiomics approaches [16, 21, 26–28]. Following uniform preprocessing and standardization of all datasets (see Methods), we performed comprehensive overlap analyses of peaks and genes across platforms. Our results show strong concordance between HT-scCAT-seq data (HEK293 and NIH3T3 cell lines) and both 10x Multiome and ISSAAC-seq platforms (Additional file 1: Fig. S1f, g, and Additional file 2: Table S3). The quality of data generated by HT-scCAT-seq is comparable to those by commercial 10x Multiome and ISSAAC-seq, which featured by detected UMI number, expressed gene number, and number of unique fragments (Fig. 1e–g). Global analysis reveals a high correlation between gene expression profiles from HT-scCAT-seq, 10x Multiome, and ISSAAC-seq (Fig. 1h). We also observed a high correlation between HT-scCAT-seq replicates (Fig. 1h and Additional file 1: Fig. S1i), indicating stability of this approach. Taken together, these observations suggest that HT-scCAT-seq is sufficient to produce high-quality profiles of gene expression and chromatin accessibility in a reproducible and robust manner.
Identification of mouse brain cell types from chromatin accessibility and gene expression profiles
To showcase HT-scCAT-seq’s effectiveness in identifying distinct cell types within complex tissues, we applied HT-scCAT-seq to adult mouse brain samples (Additional file 1: Fig. S2a and b). Together, we obtained a total of 20,523 nuclei (Additional file 2: Table S4). After stringent quality filters, we generated single-cell multimodal profiles of 14,828 high-quality mouse brain cells with a median of 6991 UMIs and 5471 unique fragments per cell (Fig. 2a and Additional file 1: Fig. S2c). For the ATAC data, the nuclei that passed the quality control exhibited a median fragment count of 5471, with 60.6% of fragments in peak (FRiP), and a TSS score of 6.64 (Additional file 1: Fig. S2d). To address batch effects between HT-scCAT-seq and 10x Multiome datasets, we evaluated three integration approaches—Harmony (Additional file 1: Fig. S3a), Mutual Nearest Neighbors (MNN, Additional file 1: Fig. S3b), ComBat (Additional file 1: Fig. S3c), and WNN (Additional file 1: Fig. S3d)—with performance quantified using the Local Inverse Simpson’s Index (LISI) and the Average Silhouette Coefficient (ASW) (Additional file 2: Table S5a). While all methods resulted in comparable cell type clustering and distribution (Additional file 1: Fig. S3a–c), Harmony performed superior batch correction efficacy, as indicated by higher LISI scores (1.7717 vs. 1.6959 for MNN in the RNA dataset; 1.6227 vs. 1.5065 for MNN and 1.4134 for ComBat in the ATAC dataset; p < 0.05, paired t-test).
Fig. 2.
Evaluating the proficiency (efficacy) of HT-scCAT-seq in simultaneously capture ATAC and RNA profiles in the mouse brain. a UMAP visualization of RNA (left) and ATAC (right) profiles from 23,307 mouse brain cells, comprising both our data (n = 14,828) and 10x Multiome (n = 8479). The cells are colored according to cell clusters (left) and approaches (top-right). Ex subtypes were identified by the neocortex areas (L2/3, L5, L5/L6, and L6). ET, extra-telencephalic; CT, corticothalamic; IT, intra-telencephalic. b Bubble plot showing expression level of neuronal/nonneuronal marker genes. c Aggregated scATAC-seq tracks displaying signals in cell type-specific peaks. d Box plots displaying the Spearman correlation analysis between gene expression level and candidate CREs accessibility. DORCs: the Spearman correlation between the DORC matrix and gene expression values in the RNA dataset. Gene activity: the Spearman correlation between gene activity scores from ATAC dataset and gene expression values from the RNA dataset. SCARlink: the Spearman correlation between single-cell gene expression predicted by SCARlink, based on chromatin accessibility, and the actual gene expression values in the RNA dataset. Spearman correlation between 10x Multiome and HT-scCAT-seq was calculated by two-tailed Mann–Whitney U tests. e UMAP colored by normalized gene expression, DORC score, SCARlink score, and gene activity score of Ex_L6 CT marker Tle4 (top: 10x Multiome, bottom: HT-scCAT-seq)
We assessed the performance of HT-scCAT-seq and 10x Multiome in capturing RNA and ATAC molecules by measuring the number of detected UMIs, genes, unique fragments, FRiP, and TSS per cell in the datasets. For 10 × Multiome, the average number of genes per cell was 2674, with an average of 4279 unique fragments, 4.88 TSS, and 36.9% FRiP. For HT-scCAT-seq, the average number of genes per cell was 2441, with an average of 5471 unique fragments, 6.65 TSS, and 60.7% FRiP (Additional file 1: Fig. S2c–e). We subsequently examined the distributions of reads across various categories: exonic, intronic, intergenic, overlapping different genes (ambiguous), multi-mapped, and unmapped (Additional file 2: Table S4).
We further compared this dataset with a published mouse brain dataset generated by 10x Multiome [29] by integrating these two datasets using Harmony-based batch correction (Additional file 2: Table S5b). We projected cells based on RNA or ATAC profiles separately on two-dimensional UMAPs [30] and performed unsupervised clustering using RNA partition data. Then, we transferred cell type labels which defined by expression of cell type-specific marker genes [31], from RNA based clusters to corresponding ATAC profiles, and found that ATAC partition profiles generated by our approach or 10x Multiome recalled those clusters with minor differences (Fig. 2a and Additional file 1: Fig. S4a). We revealed 9 excitatory neurons (Ex, Nefh, Slc17a7), 6 inhibitory neuron subtypes (In, Nefh, Gad1), and 5 non-neuron subtypes including astrocyte (Slc1a2, Slc1a3), oligodendrocyte (Plp1), oligodendrocyte precursor cells (OPC, Pdgfra), microglia (Hexb), and endothelia (Pecam1) (Fig. 2b, Additional file 1: Fig. S4b and c). When looking into highest variable genes, aggregate ATAC signals within each subtype display specific open chromatin peaks around the marker gene loci (Fig. 2c and Additional file 1: Fig. S4c). The proportions of most cell types are similar between 10x Multiome and HT-scCAT-seq datasets, except for one 10x Multiome library that was processed with sorting. Additionally, some differences in cell type proportions may have arisen due to variations introduced during sample collection (Additional file 1: Fig. S3d and Additional file 2: Table S5c). Spearman correlation between each subtype demonstrated a high level of congruence between the ATAC and RNA modalities (Additional file 1: Fig. S4e). Cell type annotations were supported by gene expression and TF motif scores (Additional file 1: Fig. S4f and g). We performed comprehensive overlap analyses of genes and peaks across both platforms. The majority of cell types exhibited substantial overlap in both genes and peaks (Additional file 2: Tables S5d, S5e, S6, and S7). However, HT-scCAT-seq identified a larger number of unique genes and peaks, likely attributed to its higher cell count (HT-scCAT-seq: n = 14,828; 10x Multiome: n = 8479).
We introduced several computational algorithms to this mouse brain dataset to further compare the performance of HT-scCAT-seq and 10x Multiome platforms. Three algorithms were employed in order to dissect gene expression active cis-regulatory element linkage (see Methods): (I) high-density domains of regulatory chromatin (DORCs) score matrix was obtained by FigR [32], (II) gene activity was calculated with default parameter using GeneActivity function in Signac [33], with genes more than four associated peaks; (III) Single-cell ATAC + RNA linking (SCARlink) [34] was employed to predict single-cell gene expression using regularized Poisson regression based on single-cell chromatin accessibility data (see Methods). Comparable prediction scores of HT-scCAT-seq and 10x Multiome dataset can be found when using either of these three algorithms (Fig. 2d). In our HT-scCAT-seq data, predictions generated using gene activity score exhibited a slightly higher correlation with gene expression compared to the correlation calculated from 10x Multiome data (median corr = 0.0648 on 10x Multiome, median corr = 0.0741 on HT-scCAT-seq). Furthermore, the correlation between DORC and SCARlink model predictions and gene expression from our data showed no significant difference compared to the 10x Multiome data (Fig. 2d), indicating good accordance of these two methods. Conclusively, HT-scCAT-seq proves to be comparable to the 10x Multiome in its ability to predict gene expression based on multiomic single-cell ATAC and RNA. Next, we examined the genes predicted by the FigR and SCARlink models and identified 114 genes shared by both models. We observed that Ex_L6 CT marker Tle4 was identified more specifically in our data (Fig. 2e). As glutamatergic pyramidal neurons, Ex_L6 CT can elicit action potentials in layer 5a neurons while suppressing L4 neurons upon activation [35, 36].
Single-cell multiomics reveals regulatory dynamics during mouse embryo skin development
To comprehensively understand the underlying regulatory mechanism driving the specification of multiple cell types during skin formation, we employed HT-scCAT-seq to embryonic mouse dorsal skin samples from E13.5 to E18.5 stages (Fig. 3a). To assess the similarity of reproduced peaks between technical replicates, we calculated the Pearson correlation coefficients across different regions. These were based on gene activity (Additional file 1: Fig. S5a and d), gene expression (Additional file 1: Fig. S5b and e), and DORC gene expression (Additional file 1: Fig. S5c and f). Together, we obtained a total of 95,349 nuclei. After stringent quality control, 68.2% of cells passed QC, and cells from all five stages were analyzed together (Additional file 2: Table S8). Joint analysis on both data modalities yielded a total of 64,408 cells with 28,250 expressed genes and 175,103 accessible peaks (Additional file 1: Fig. S6a and b).
Fig. 3.
Single-cell multiomics assays uncovered dynamic features and heterogeneity of developing mouse skin. a Scheme of sample preparation. b UMAP visualization of WNN (left), RNA (middle), and ATAC (right) partition profiles, colored clusters. The dashed circle shows the same UMAP colored by developmental stage. c Bubble plot showing gene expression of selected marker genes for each RNA cluster. Color bar: relative expression levels across all clusters, bubble size: percentage of cells within each cluster that express the gene. d Bubble plot showing gene activity scores for the makers in c. e Heatmap showing gene expression (left) and chromatin accessibility (right) for 15,016 significantly linked CRE-gene pairs. Each row represents a linked pair of gene and CREs. Bar on the top represents the cell types involved in skin development. f Track view of aggregated ATAC signal around the Pdgfra, Krt1, and Krt17 locus. Peaks and peak-to-gene linkages are shown below the tracks. Right, violin plot shows the integrated expression levels of Pdgfra, Krt1, and Krt17 for each cell type. Red vertical bars highlight selected peaks linked to Pdgfra, Krt1, and Krt17 expression. g Feature plots showing TF activity score (top), gene activity score (middle), and gene expression level (bottom) for Twist2. h Schematic of the conceptual workflow illustrating one state in RNA cluster corresponding to three states in ATAC cluster. i UMAP visualization of RNA (top) and ATAC (bottom) data from 591 periderm cells. Middle, nuclei are colored by the cluster (left) and developmental stage. Right, the distribution of periderm cells from E15.5 to E18.5. j Left, pseudotime trajectory of the A0, A1, A3, and A4 profiles. Right, UMAP visualization of A0, A1, A3, and A4 profiles. Nuclei were derived from E15.5 to E18.5. k Heatmap showing normalized ATAC signals of the top 500 cluster-specific peaks across four statuses, plotted along the pseudotime. Right, representative enriched GO enrichment terms within each status
We performed dimensional reduction of the resulting profiles using Seurat [37] and Signac [33], respectively. Biological replicates from the same developmental stage showed strong overlap in UMAP embedding (Additional file 1: Fig. S6c). We performed cell type annotation using RNA partition datasets and transferred cell type labels to the corresponding ATAC clusters. Fifteen majors cell types were identified across 19 distinct clusters (Fig. 3b and Additional file 1: Fig. S6d). All expected embryonic skin cell types [7, 38] were verified by expression of canonical marker genes, including basal cell (Krt5 and Krt15), spinous cell (Krt1 and Krt10), periderm (Grhl3 and Myh14), fibroblast (Col1a1 and Twist2), dermal papilla (Hhip and Bmp7), vascular endothelium (VE, Cdh5 and Vwf), lymphatic endothelium (LE, Flt4 and Acta2), melanocytes (Sox10), neural crest (Pcbp3), Schwann cell (Itgb8), muscle cells (Rgs5), mast cell (Kit and Il1rl1), and macrophages (Cd163) (Fig. 3c and d). Cells from different stages were well integrated, and their relative contributions to each cell type varied in agreement with temporal development of mouse skin [7, 15] (Additional file 1: Figs. S5g and S6e). We calculated gene activity score [33] by summing up number of unique chromatin fragments intersecting gene body and promoter regions (Fig. 3d, see Methods). The above-mentioned RNA makers also appeared similar patterns of chromatin accessibility in corresponding ATAC clusters, indicating strong congruence between these two modalities (Fig. 3c and d).
Next, we aimed to use multiomics data to identify cell type-specific CREs and their target genes by comparing gene expression with chromatin accessibility across all cells in the dataset. We first identified specific gene expression for these cell types. Differential gene expression (DEG) analysis between subtypes revealed 5299 DEGs (adjusted p value < 0.05 and log2(fold change) > 0.1). Feature linkages are characterized by a significant correlation between the accessibility of ATAC peaks and gene expression [39]. To identify regulatory elements correlated to each DEG, we performed a peak-gene linkage analysis by calculating the Pearson correlation coefficient (PCC) between gene expression and chromatin accessibility of peaks within 500 kb of each TSS [16, 40]. Positively correlated peak-gene pairs were identified as potential enhancer-gene interactions [41]. This analysis yielded a total of 15,016 peak-gene links, including 12,687 regulatory elements significantly linked to the 3166 DEGs (Fig. 3e and Additional file 2: Table S9, correlation > 0, adjusted p value < 0.05). Each DEG was linked to a median of three peaks (min = 1, max = 39, mean = 4.743).
Then, we investigated whether candidate CREs could mediate the expression of differentially expressed genes. Our analysis revealed that some linkages with differentially expressed genes were uniquely identified in embryonic skin lineage cell types. For instance, the locus at chr5: 74,804,745–74,805,603 which mapped to the promoter region of Pdgfra showed the most significant accessibility in dermal-derived cells (Fig. 3f). Compared to other cell types, dermal-derived cells (fibroblasts and DP) exhibited highest expression level of Pdgfra. Besides, we observed that a peak (E2, chr5:75,174,536–75,175,336) which linked with Pdgfra expression showed strong accessibility signal only in DP (Fig. 3f). Together, we identify the locus at chr5: 74,804,745–74,805,603 as a candidate CRE for dermal-derived cells (fibroblasts and DP), capable of upregulating the expression of Pdgfra (Additional file 1: Fig. S6f). Additionally, peak-gene links at Krt17 (chr11:100,261,838–100,262,902) and Krt1 (chr15:101,855,353–101,856,097) showed significant enrichment within periderm and spinous cells (Fig. 3f, Additional file 1: Fig. S6g and h). Krt17 is strongly expressed in periderm, whereas Krt1 is expressed in spinous cells. Krt1 has been identified as a target of Notch signaling pathway in the epidermis and serves as a driver gene for the differentiation of basal cells into spinous cells [42–44].
To investigate TFs that potentially drive the regulatory programs in each embryonic skin cell type, we analyzed the enriched TF binding motifs that were present within these linked peaks using chromVAR (see Methods) [45]. The criteria for defining these peak gene-TF pairs included a significant correlation between peaks and genes, high TF expression level, and enriched TF binding motif in accessible ATAC peaks [46]. This analysis resulted in the identification of 75 putative markers across 15 major cell types (adjusted RNA p value < 0.05 and adjusted motif p value < 0.05, Additional file 2: Table S10). For instance, motif enrichment analysis indicated a strong enrichment of TWIST2 binding motif in dermal-derived cells. We further demonstrated concordance by examining the RNA expression, gene activity score, and TF activity of Twist2 (Fig. 3g). Twist2 serves as a marker of embryonic upper dermal fibroblasts [47]. Twist2 knockout in postnatal mice results in pronounced skin abnormalities including skin atrophy and fat deficiency, ultimately leading to death from cachexia within the first 2 weeks of life [48, 49].
To systematically investigate the gene regulatory program of lineage commitment during periderm development, we focused on the correlation between chromatin accessibility and gene expression in periderm cells. In our datasets, we examined periderm cells from five time points and classified them into early (E13.5 to E14.5) and late states (E15.5 to E18.5). Early periderm subcluster expressed Tgfb2, Cldn23, and Myh14, while late periderm subcluster expressed markers of terminal differentiation such as Bcl11b (Additional file 1: Fig. S6i). From E13.5 to E18.5, the cells from periderm cluster were consistently present in RNA UMAP (Fig. 3b). We conducted a focused analysis by extracting 245 periderm cells from late states (E15.5 to E18.5) and performed re-clustering. Two subclusters based on RNA profiles were identified (RNA clusters R0, R1), while five subtypes based on ATAC profiles were identified (ATAC clusters A0, A1, A2, A3, A4) (Fig. 3h and i). The heterogeneity was visible by clustering at the chromatin, showing five main chromatin states. To further investigate the epigenetic differences within the ATAC cluster, Monocle3 analysis [50] was applied to construct the developmental trajectory of cells from A0, A1, A3, and A4 (Fig. 3j). The developmental trajectories were highly consistent with developmental time (cells ordered from E15.5 to E18.5). Along the pseudotime trajectory, we identified four distinct chromatin accessibility statuses: S1, S2, S3, and S4 (Fig. 3h), while RNA profiles exhibited similar gene expression patterns (Additional file 1: Fig. S6j). The top 500 cluster-specific peaks, sorted by fold enrichment, were taken for further analysis. Biological functions for peak associated genes of each status were annotated using genomic regions enrichment of annotations tool (GREAT) [51]. We noted that S1 was enriched with functions such as mitotic G1/S transition checkpoint. S2 was enriched with functions related to the regulation of apoptotic signaling pathway [11]. S3 exhibited associations with immune-related terms and reactions to environmental stimulation (Additional file 2: Table S11) [52, 53]. S4 was enriched with functions such as hair cycle process and hair follicle development (Fig. 3j and Additional file 2: Table S11). These results demonstrated that cells with similar gene expression profiles exhibit distinct chromatin accessibility status.
Fibroblast heterogeneity and construction of GRNs governing fibroblast development
Dorsal dermis is a connective tissue derived from somatic mesoderm [54]. Dermis is embedded in an extracellular matrix (ECM) composed of collagen and elastic fibers. Among the developing dermis, fibroblasts are the major cell type that are considered an important cell lineage for skin development [55, 56]. While prior single-cell studies have demonstrated dynamics of gene expression in fibroblasts, they provided limited insights on changes of chromatin state. We sought to utilize single-cell multiomics approach to identify fibroblast specific CREs and their target genes.
We first performed gene ontology (GO) analysis on DEGs of fibroblast-derived cells from E13.5 to E18.5, categorizing the fibroblasts into early (E13.5 to E14.5) and late (E15.5 to E18.5) stages (Additional file 2: Tables S12 and S13). GO analysis reveals that early-stage fibroblast-derived cells were involved in regulation of the WNT signaling pathway, embryonic organ development, and connective tissue development [57]. These pluripotent functionalities may reflect early commitments towards future fibroblast fates (Fig. 4a, Additional file 1: Fig. S7a, Additional file 2: Tables S12 and S13). In the late-stage fibroblast-derived cells, the enriched GO terms included cellular response to lipid, collagen fibril organization, and laminin interactions. These more specialized functionalities may indicate events related to fibroblast specialization [58] (Fig. 4a and Additional file 1: Fig. S7a).
Fig. 4.
Reconstruction of fibroblast trajectories and identification of specific TF regulatory network. a Heatmap showing expression of DEGs of each developmental stage (adjusted p value < 0.05 and an average log2(fold change) > 0.1), alongside a list of some well-studied markers. Histogram on the right shows enriched GO terms (molecular function) for each developmental stage. p value obtained from the hypergeometric test are shown; color scale indicates gene ratio. b UMAP visualizations of fibroblast subclusters using RNA profiles. Cells are colored by cell types (top) or developmental stage (bottom). c RNA velocity of fibroblasts subtypes, projected onto the UMAP using scVelo. Colors indicate different cell types. d UMAP displaying cell fate bifurcation of fibroblast subtypes using CellRank. Four trajectories are shown: Fib.DC, Fib.Deep, Fib.Inter, and Fib.Lower. Cells are colored according to fate probabilities by CellRank. e Heatmap showing gene expression (right) and chromatin accessibility (left) of 495 significantly linked gene-CRE pairs along the Fib.DC trajectory. Each row represents a pair of gene and a linked CRE. Columns are sorted by diffusion pseudotime. f. Scatter plots showing the gene expression values of driver genes·(y-axis) plotted against the pseudotime of the Fib.DC trajectory (x-axis). The fitted smooth line indicates a 95% confidence interval. g Enriched GO terms for driver gene sets upregulated at stage 1 and stage 2 of the Fib.DC trajectory. Dot size denotes enrichment ratio and color reflects significance. h Heatmap showing TF-DORC regulation scores derived from Fib.DC trajectory data using FigR (n = 125 TFs, n = 395 DORCs), with DORC represented by row and candidate TF regulators by columns. i Scatter plot showing knockout (KO) simulation result of TFs in the Fib.Origin (x-axis) and Fib.DC lineage (y-axis). j UMAP colored by normalized DORC score, TF activity score, and gene expression level of underlying TFs in Fib.DC lineage (left: RUNX2, right: ALX4). The line plot (bottom) shows smoothed gene expression trends in Fib.DC trajectory. k Heatmap showing RUNX2 CUT&Tag signals at the 2.5 kb flanking regions upstream and downstream of peak centers. The rows were sorted by the descending signals in 2839 bulk peaks. The upper panel illustrates average binding intensity. l Integrative Genomics Viewer (IGV) displaying RUNX2 signal at predicted target Nrp2 locus at both E13.5 and E18.5 stages, as seen in CUT&Tag and ATAC datasets
To elucidate the heterogeneity within fibroblast cells, we extracted fibroblast profiles and re-clustered them into eight subtypes defined as previous report [7] (Fig. 4b). We observed a continuous progression from E13.5 to E18.5 when fibroblasts are projected into low-dimensional subspaces (Fig. 4b). Cells from E13.5 were located near the center of the low-dimensional manifold, whereas cells from later stages were positioned towards the periphery, likely indicating more differentiated cell states (Fig. 4b). We observed distinct gene dynamics across eight fibroblast subclusters within our dataset (Fig. 4b–d and Additional file 1: Fig. S7b). As the chondrocytes originating from the somatic mesoderm are not considered as fibroblasts, we have excluded them from downstream analysis [59]. Fib.Origin initially originates at E13.5 and progressively diminishes at E15.5, whereas other subclusters originate at E14.5 and subsequently expand. This suggests that Fib.Origin may have the potential to differentiate into various fibroblast subtypes.
To estimate the cellular differentiation dynamics among fibroblast subclusters, we employed RNA velocity, which predicts the future states of individual cells by analyzing the ratios of spliced and unspliced mRNAs [59] (Fig. 4c). The RNA velocities indicated a directional flow from the Fib.Origin at the center of the embedding towards the later time points located at the periphery of the embedding (Fig. 4c). To further refine our velocity predictions, we applied CellRank, which detects the probabilities of initial and terminal states for each cell based on RNA velocity [60, 61]. Consistent with RNA velocity predictions, CellRank identified higher initial cell states probabilities in Fib.Origin and higher terminal cell state probabilities in other subclusters (Fig. 4d).
Next, we further investigated gene expression and corresponding regulatory states along each trajectory. We calculated Pearson correlation between gene expression level and fate probabilities to identify differential genes expressed biased towards Fib.DC fate. Genes with significant positive correlations were defined as candidate driver genes [60]. We identified 527 driver genes in the Fib.DC trajectory, including early markers such as Fst and Sema6a, as well as late markers such as Dll1 and Bmp3 (correlation > 0.05 and adjusted p value < 0.05) (Fig. 4e–g). To further investigate when and how these driver genes were regulated during fibroblast differentiation, we extracted 10,547 cells alongside the Fib.DC trajectory (fate probability > 75% quantile) and sorted by pseudotime ordering. We then performed peak-gene linkage analysis as described above and identified 495 peak-gene pairs (Fig. 4e). Using k-means clustering, 527 differential genes were classified into two distinct groups (stage 1 and stage 2, Fig. 4e). Genes upregulated at stage 1 of the trajectory are enriched in pathways such as skin development, epithelial cell proliferation, and WNT signaling. These pluripotent functionalities indicate maintaining Fib.Origin homeostasis. In contrast, genes upregulated at stage 2 of Fib.DC trajectory are associated with molting cycle, regulation of animal organ morphogenesis, and appendage development (Fig. 4g). We then applied an analogous approach to the Fib.Inter trajectory and identified 1196 differential genes, such as Mfap5 and Klf4 (Additional file 1: Fig. S7c). Pathway enrichment analysis of these genes revealed involvement of mesenchymal cell proliferation in the stage 1, cell aggregation and fibroblast proliferation in the stage 2 and stage 3 of the trajectory (Additional file 1: Fig. S7d and e).
To examine cis-regulatory landscape of fibroblasts, we defined the underlying GRNs using FigR [32]. We calculated TF motif enrichment, considering both expression level and chromatin accessibility for all DORCs, to generate the regulation score representing intersection of motif-enriched and RNA-correlated TFs. We distinguished five unique modules of DORCs that are regulated by TFs (Additional file 1: Fig. S7f). Key TFs for Fib.Origin are enriched in module 2 subclusters, such as WNT/β-catenin associate factor LEF1 [62, 63] (Additional file 1: Fig. S7g).
To further identify potential regulatory gene targets of TFs driving Fib.DC cell identity, we extracted cells from the Fib.DC trajectory and examined gene regulatory networks using FigR. We calculated Spearman correlation between gene expression and accessibility of peaks within a 100-kb window around TSS (p value < 0.05). Then, we identified 395 DORC regions (each with at least four significant peak-gene associations) and 125 TF motifs (regulation score ≥ 1) and determined two distinct gene modules regulated by different TFs (Fig. 4h and Additional file 1: Fig. S7h). We proposed several potential regulatory factors that could serve as lineage determinants for Fib.DC and DP, including RUNX2, SOX2, ALX4, and PRRX2 (Fig. 4h and Additional file 2: Table S14). The absence of RUNX2 delays hair follicle maturation, leading to significantly reduced thickness of both the overall skin and epidermis in Runx2-deficient embryos [64]. ALX4 plays a crucial role in hair follicle development, as Alx4-null mice exhibit dorsal alopecia [65]. SOX2 as a key TF regulates hair growth by modulating WNT signaling [66]. Next, we applied CellOracle to simulate changes in Fib.DC identity upon TF perturbation [67]. This in silico strategy employs GRN to simulate cell state of each cell following perturbation of candidate TFs. We then calculated perturbation scores for TFs detected in FigR (Fig. 4i and Additional file 2: Table S15). High perturbation scores suggest that in silico knockout of the TF significantly decreased, suggesting that the TF is an essential regulator of Fib.DC trajectory. Interestingly, while many TFs showed correlated perturbation scores according to CellOracle, RUNX2 and ALX4 exhibited relatively high specificity for Fib.DC trajectory (Fig. 4j).
To investigate the functional role of predicted transcription factors (TFs) within the Fib.DC lineage, we performed Cleavage Under Targets and Tagmentation (CUT&Tag) experiments (see Methods). RUNX2 CUT&Tag data were generated from fibroblasts derived from embryonic skin at E13.5 and E18.5 stages. To evaluate the signal surrounding RUNX2 binding sites, we first extracted the RUNX2 motif dataset from the JSAPAR database and identified regions that overlapped with ATAC peaks in the E13.5 and E18.5 HT-scCAT-seq skin data. We then further overlapped these regions with RUNX2 binding sites identified in the E13.5 and E18.5 CUT&Tag datasets. Through this overlap analysis, we found 2839 peaks that appeared in both the ATAC peaks and the RUNX2 CUT&Tag binding sites (Fig. 4k). The overall background noise of the RUNX2 CUT&Tag data was relatively low, and strong enrichment signals were observed near the RUNX2 motif binding sites. Compared to E13.5, the E18.5 RUNX2 CUT&Tag data displayed higher binding signals, potentially reflecting an enhanced role of RUNX2 in fibroblast differentiation during later stages of development (Fig. 4k), indicating that RUNX2 likely exerted a more prominent transcriptional regulatory role in the late stages.
Next, in the TF-DORC regulatory network of the Fib.DC lineage, we calculated regulation scores between transcription factors and DORCs, identifying 31 target genes associated with the transcription factor RUNX2 (regulation score > 0.8, Additional file 1: Fig. S7i and Additional file 2: Table S16). Among these, we observed strong RUNX2 CUT&Tag binding peaks within the predicted RUNX2 target gene Nrp2 (Fig. 4l), which also emerged as a stage 2 driver gene in the Fib.DC trajectory. Furthermore, literature has reported that Nrp2, a cysteine receptor, plays a key role in promoting signal transduction within the skin microenvironment and is essential for hair follicle regeneration [10]. Similarly, strong CUT&Tag signals can also be observed for the RUNX2 target genes Vgll4 and Pgs1 (Additional file 1: Fig. S7j and k). Together, these findings suggest that RUNX2 acts as a critical regulatory factor driving the development of the Fib.DC lineage.
Discussion
Here, we present a single-cell multimodal profiling approach named HT-scCAT-seq, which enables joint detection of transcriptome and chromatin accessibility for ten thousand single cells in one time. We benchmark HT-scCAT-seq data with other single-cell joint-profiling approaches such as 10x Genomics and ISSAAC-seq and find that HT-scCAT-seq produced data with equal quality or even better quality compared with similar approaches. Finally, we applied HT-scCAT-seq to mouse embryonic skin samples, to dissect the underlying transcriptional regulatory program for each cell lineage. By TF-peak-gene linkage analysis, we demonstrated how lineage specific chromatin regulators facilitate downstream gene expression and reconstructed a lineage specific chromatin regulatory network for Fib.Origin subtype.
HT-scCAT-seq is a stable and easy handling single-cell method for ordinary biological and medical laboratories. Although experiments in this work are performed with DNBelab C4 platform, HT-scCAT-seq can simply adapt to other microfluidic based devices such as 10x Chromium, Bio-Rad ddSEQ, or manual ones. The throughput of HT-scCAT-seq is largely between one to ten thousand per reaction and can be scaled up to a hundred thousand by employing combinatory-indexing strategy which introduce an extra cellular barcode in transposition and reverse transcription steps. Besides, these extra barcodes can be served as sample barcodes, which may be time-saving, cost-efficient, and potentially free from batch effect.
While our study provides a systematic comparison of HT-scCAT-seq with existing single-cell multiomics platforms, we acknowledge that biological variability—such as subtle differences in cell states, sample preparation protocols, or tissue dissociation conditions—could confound direct performance evaluations. Specifically, dataset heterogeneity in benchmarking (e.g., biological variability across cell batches, technical variability from platform-specific library preparation) may lead to overestimation or underestimation of method performance when comparing across studies. To mitigate this, we have rigorously applied batch correction methods (e.g., Harmony integration) and normalized feature distributions, ensuring technical comparability across datasets and reducing confounding effects. To definitively disentangle biological from methodological variability, future work will profile identical cell populations using HT-scCAT-seq, 10x Multiome, and ISSAAC-seq under standardized experimental conditions. Overall, after data normalization, our analyses—including QC metrics, peak/gene overlap, cell type proportions, and WNN integration—demonstrated robust consistency across platforms, supporting the validity of our comparisons.
Recent studies have shed light on the idea that chromatin structure is a key determinant for gene expression [68–71]. Factors such as binding of transcription factors, chromatin accessibility, nucleosome occupancy, DNA methylation, and histone modifications play roles in building chromatin landscape. It is imprecise to link two populations defined by different omics data; therefore, multiomics data can give us direct linkage of two or more layers of the regulome [72]. Yet current experimental methods including HT-scCAT-seq are still limited in profiling two or three layers in one time, which is still a big challenge for researchers. However, in silico assembly of multiply datasets containing different layers together may help us to reconstruct the panorama of epigenetic state, which may revolute our understanding of cell types and cell states.
Conclusions
In this study, we introduce an enhanced single-cell multimodal profiling technique, HT-scCAT-seq, which simultaneously detects chromatin accessibility and gene expression in individual nuclei in high-throughput manner. We highlight the efficacy and cost-efficiency of HT-scCAT-seq as a high-throughput method for single-cell multiomics. We provide extensive evidence of cell type-specific candidate cis-regulatory elements (cCREs) and transcription factors that are essential for decoding the transcriptional regulatory program of mouse embryonic skin.
Methods
Cell culture
HEK293T (ATCC) and NIH3T3 (ATCC) cells were cultured in DMEM, high glucose (Thermo, 11965126) supplemented with 10% FBS (HyClone™, SH30071.03) and 1% Penicillin-Streptomycin (Thermo, 10378016) at 37°C with 5% CO2. The cell lines were tested for mycoplasma contamination (all results were negative), but were not authenticated.
Animal study
Wild-type C57BL/6 mice (Guangdong Medical Laboratory Animal Center) were interbred and pregnant females were sacrificed at E13.5, E14.5, E15.5, E16.5, and E18.5. Embryonic dorsolateral skin was micro-dissected, pooled (5 embryos per time point) and pre-chilled in cold PBS, followed by cell dissociation and nuclei extraction steps. Total 25 mouse embryos were processed and analyzed in following experiments. Mouse brain tissue was collected from wild-type C57BL/6 mice aged 8 weeks.
Cell dissociation from embryonic dorsolateral skin
To generate a single-cell suspension, embryonic dorsolateral skin samples were first digested by Enzyme I (1% Penicillin-Streptomycin (Thermo, 10378016) and 0.125% trypsin (Thermo, 25200056)), which were incubated at 37°C for 10 min on a thermo shaker. A mixture of tissue and cells was filtered using a 70-µm cell strainer (Falcon, 352350). The obtained mixture was incubated Enzyme II (2.5 mg/mL Collagenase Type IV (Thermo, 17104019), 1 mg/mL Collagenase Type I (Thermo, 17100017), 1 mg/mL DispaseII (Thermo, 17105041), 1% Penicillin-Streptomycin (Thermo, 10378016)) in DMEM/F-12 (Thermo, 11320033) for 15 min at 37°C on a thermal shaker. Dissociated cells were filtered through a 40-µm cell strainer (Falcon, 352340). Cell suspension was centrifuged for 5 min at 500 g and cells were washed with 0.04% BSA/PBS for 1 or 2 times.
Nuclei preparation and fixation
For the species mixing and embryonic dorsolateral skin experiments, single-nucleus preparations were derived from the Omni-ATAC protocols as previously described [73], with some adjustments. In brief, 5 × 105 cells were collected and resuspended in 100 µL of chilled cell lysis buffer (10 mM Tris–HCl pH7.5, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20 (Sigma, P9416), 0.1% NP40 (Roche, 11332473001), 0.01% digitonin (Sigma, D141), 1% BSA/PBS, and 0.8 U/μL RNase inhibitor (Neoprimaries, LS-EZ-E-00006P)) and incubated on ice for 5 min. Subsequently, 1 mL of chilled resuspension buffer (10 mM Tris–HCl pH7.5, 10 mM NaCl, 3 mM MgCl2, 0.1% Tween-20, 1% BSA/PBS, and 0.8 U/μL RNase inhibitor) was added into the lysed cell suspension, and nuclei were spun down at 500 g, 4°C for 5 min. Nuclei were resuspended in 50 µL of PBSI (1 U/μL RNase inhibitor and 0.5 U/μL SUPERase inhibitor (Thermo, AM2696)) and counted using DAPI staining.
For the adult mouse brain experiments, single-nucleus preparations were derived from the Omni-ATAC protocols as previously described [73]. Mouse brain tissue was placed into a pre-chilled 2-mL Dounce homogenizer with 2 mL of homogenization buffer (HB; 1 × basic buffer (20 mM Tris pH 7.8, 25 mM KCl, and 5 mM MgCl2), 250 mM sucrose, 1 mM DTT, 1 × Protease Inhibitor Cocktail, 0.8 U/μL RNase inhibitor, and 1% BSA). Tissue was homogenized with 15 strokes using pestle A and the homogenate was filtered through a 70-μm cell strainer. Transfer homogenate to a new Dounce homogenizer and homogenize with 10 strokes using the pestle B. The homogenate was filtered through a 40-μm cell strainer and centrifuged at 500 g, 4°C for 5 min. 2.4 mL of nuclei wash buffer (20% iodixanol, 1 × basic buffer in HB) was added into the nuclei suspension, and nuclei were spun down at 800 g, 4°C for 10 min. Nuclei were resuspended in 50 µL of PBSI and counted by DAPI staining.
Nuclei were fixed in 0.1% formaldehyde (Sigma, 1209228) in PBSI for 5 min at room temperature and quenched with 0.125 M glycine (Sigma, 635782). Fixed nuclei were spun down at 500 g, 4°C for 5 min, followed by washing twice in PBSI and resuspended in 50 µL of PBSI.
HT-scCAT-seq library preparation and sequencing
For in situ transposition, 200,000 fixed nuclei were resuspended in tagmentation mix (0.08 U/μL Tn5 transposase and 1 × TAG buffer in PBSI) and incubated at 37°C for 30 min with shaking (500 rpm). The nuclei were then centrifuged at 500 g, 4°C for 5 min and resuspended in 8 µL PBSI. Transposed nuclei were mixed with reverse transcription buffer (0.625 mg/mL TransFlex III Reverse Transcriptase (Neoprimaries, LS-EZ-E-00027Q), 1 × RT buffer, 1 mM dNTP mix, 2.5% PEG 6000, 1 mM dCTP, 3.75 μM Oligo dT, 2.5 μM TSO, and 1 U/μL RNase inhibitor), and RT was performed (10°C for 30 s, 20°C for 30 s, 30°C for 30 s, 40°C for 30 s, 50°C for 5 min, 4°C for 3 min, 10°C for 45 s, 20°C for 45 s, 30°C for 30 s, 42°C for 2 min, 50°C for 10 min). After in situ reverse transcription, nuclei were centrifuged at 500 g, 4°C for 5 min, washed twice, and resuspended in 50 µL of 1% BSA in PBS. Libraries were generated using DNBelab C Series Single-Cell ATAC Library Prep Set (MGI, 940-000793-00) following the user protocol with the following modifications: in encapsulation and pre-amplification step, the RNA PCR primer (biotin-modified) was added; after emulsion breakage, MyOne C1 Dynabeads were added to an equal volume mixture of RNA and ATAC product with 1 × B&W buffer (50 mM Tris pH 7.5, 0.5 mM EDTA, 1 M NaCl); after emulsion breakage, RNA and ATAC product were purified and constructed separately. All libraries were sequenced using DIPSEQ T1 platform at China National GeneBank (CNGB).
Single-cell data preprocessing
In the initial preparation of single-cell ATAC-seq (scATAC-seq) data, we first aligned Read 1 and Read 2 FastQ files to either the mm10 or hg38 genome. We then used Chromap (v0.2.1) [22] to generate fragment files. To process the barcodes, we utilized d2c (v1.5.3) to calculate and merge beads from the same barcode. MACS2 (v2.28) [74] was employed to identify peaks and produce a peak matrix. For the scRNA-seq data, Read 1 FastQ (comprising Barcode 1, Barcode 2, and UMI) and Read 2 FastQ were aligned and annotated using scStar (v1.0.3) and Anno (v1.4). Beads were subsequently merged into barcodes based on the results from the d2c step in the ATAC-seq data preprocessing, and the gene expression matrix was derived using PISA (v1.10.2) [24].
For the species mixing experiment, we combined HEK293T and NIH3T3 cells and processed them through the HT-scCAT-seq workflow utilizing a combined reference genome of hg38 and mm10. Cell barcodes with more than 80% of reads mapped exclusively to a single genome were classified as singlets; others were considered as doublets.
Single-cell metric comparison to other methods
We downloaded FastQ files of cell line data from other multiomics technologies. To ensure uniformity, we downsampled all data sets to 50,000 read pairs per cell for each modality. We then applied the ISSAAC-seq preprocessing workflow to these data consistently. For scATAC-seq data, we employed Chromap (v0.2.1) [22] for read mapping. Meanwhile, scRNA-seq reads were analyzed using STARsolo (v2.7.10a) [75]. From these processed data, we derived count matrices which were subsequently utilized by ArchR (v1.0.2) [76] to calculate quality metrics. Genes and peaks overlapping analysis was performed based on the count’s matrix. Features (genes or peaks) that are expressed in fewer than 0.1% of the total cells or detected in fewer than 10 cells are excluded from further analysis. The retained genes and peaks are then subjected to cross-platform overlapping analysis to identify shared features across different platforms.
Mouse brain data analysis
The gene expression matrix of mouse brain was processed using Seurat (v4.3.0.1) [37] to create a Seurat object. Cells were filtered based on the following metrics: nCount_RNA between 200 and 100,000, nFeature_RNA less than 7500, and percent.mt less than 20. Each data modality was then adjusted for batch effects using Harmony. Data normalization was performed using the NormalizeData function. To identify top 3000 variable genes, the FindVariableFeatures function was used. The first 50 principal components (PCs) were determined by running RunPCA on these variable genes, and cell clusters were identified using FindNeighbors. The FindAllMarkers function was used to identify marker genes in each cluster. For mouse brain scATAC-seq data, the peak matrix was processed using Signac package (v1.12.0). Cells were filtered based on the following metrics: nCount_ATAC between 2000 and 20,000, FRiP greater than 0.2, and TSS.enrichment between 2 and 20. The matrix was normalized, and clusters were identified with dimensions ranging from 2 to 40, using default settings according to the Signac documentation. The difference of the chromatin status between clusters were computed by peak calling using MACS2.
Prediction of RNA expression by DORC, SCARlink, and Signac
Paired cells of scATAC-seq and scRNA-seq mouse brain data was used in these assays. High-density DORCs were calculated within a distance of 50 kb from each gene’s TSS, and the Spearman correlation coefficient was calculated for each gene-peak pair [32]. ChromVAR (v1.16.0) [45] was used to generate the overall accessibility and GC content, which was then used to perform background peak correction. One-tailed Z-test was calculated to determine the association of each gene-peak pair. DORCs were then defined as those with a permutation p value less than or equal to 0.05 and genes with more than four associated peaks in this assay. To obtain a single-cell DORC score matrix, the scATAC-seq data was first normalized by peak counts. Each gene’s DORC score was then calculated as the sum of counts based on the significantly correlated peaks per gene. SCARlink [34] was applied to the 250 kb upstream and downstream of the gene body. Regularized Poisson regression was performed to predict gene expression. Top 3000 variable genes were selected using Seurat [37], which were used as input to SCARlink. Predicted gene expression matrix was generated using SCARlink with default parameters.
Mouse embryonic skin data analysis
The processing of single-cell RNA and ATAC data of mouse embryonic skin was using Seurat [37] and Signac [33]. Quality control is performed separately for scRNA-seq and scATAC-seq data. Cells with high mitochondrial content (percent.mt > 20) or low RNA counts (nCount_RNA < 200) are removed from the scRNA-seq dataset. For scATAC-seq, cells with TSS enrichment score less than 4 and fewer than 2000 captured fragments are excluded.
In both scRNA-seq and scATAC-seq, we used scDblFinder (v1.16.0) [77] and Harmony to remove doublet and batch effect. For scRNA data, we also performed removal of ambient RNA by using SoupX (v1.6.2) [78] with default parameters. After background filtration, normalization and scaling of RNA gene expression levels was performed using the NormalizeData and SCTransform function. Then, PCA is performed, and the first 50 principal components were used to group cells into clusters using FindNeighbors and FindClusters functions in Seurat. For scATAC-seq data, iterative LSI dimensionality reduction is performed for 2:40 components, taking the top 50% variable peaks and evaluating resolutions 0.5. Cells are then projected in a 2D space using the RunUMAP function. The scATAC-seq data was annotated using the corresponding annotation of paired scRNA-seq cells. Gene activity matrix was created with function GeneActivity in Siganc. Motif activity matrix was calculated using chromVAR based on JASPAR 2020 database, and FindMotifs function was used to identify differentially enriched motifs; TF motifs with interest were visualized and used for further analysis [79].
Peak-to-gene linkage identification
Same as the description in the “prediction of RNA expression by DORC.” Briefly, the associations between peak and gene were calculated using LinkPeaks function in Signac. The Pearson correlation coefficient was calculated for each peak within a distance of 50 kb from the gene’s TSS. p value was then adjusted by Benjamini-Hochberg method. Peak-gene links with coefficients < 0 and adjusted p value > 0.05 were removed [32]. This ensures that only statistically significant and positively correlated peak-gene pairs are considered for downstream analyses.
Identification of fibroblast subtypes
We extracted all fibroblast cluster cells from mouse embryonic skin data and utilized R package Seurat and Signac to conduct a general upstream analysis. Briefly, 2000 variable genes and a maximum dimension of 40 were selected. The SLM algorithm and a resolution of 0.5 was used in FindClusters functions to produce cell-type clusters, which we then annotated based on marker gene expression. To identify DEGs and differential peaks, Seurat FindAllMarkers with default setting was performed. We used clusterProfiler (v3.11.0) to enrich gene ontology (GO) terms for differentially expressed genes [80]. Additionally, top 500 stage-specific peaks were visualized by heatmap in periderm cells and were imported into the GREAT analysis website (http://great.stanford.edu/public/html/) for GO enrichment analysis [51].
RNA velocity and trajectory inference
Data of two modalities processed by Seurat and Signac were loaded as “AnnData” object with Chod cell type removed. Spliced and unspliced reads information was extracted from possorted bam file using velocyto (v0.17.15) and added to the object [81]. The data was then filtered and normalized using scVelo.pp.filter_and_normalize function with parameters (min_shared_counts = 20, n_top_genes = 3000), 40 principal components were used to find the cell neighbors using scVelo.pp.neighbors. Velocity analysis was performed using “scVelo.tl.velocity” with default parameters. Previous UMAP coordinates was used for velocity visualization.
We used scanpy.tl.diffmap to designate a random cell of Fib.Origin as the root cell and calculated pseudotime using scanpy.external.palantir. Subsequently, we computed a directed cell-cell transition matrix using the PseudotimeKernel from Cellrank (v2.0.4) [82]. We then calculated and visualized fate probabilities towards terminal states based on the GPCCA module established by PseudotimeKernel. We used g.compute_lineage_drivers to compute driver genes and filtered driver genes with correlation greater than 0.05 and adjusted p value less than 0.05. These driver genes were calculated for associated peaks with LinkPeaks in Siganc, and the result was visualized by heatmap. For the Fib.DC lineage, we set the first 95% of genes as start/middle and the last 5% of genes as end for GO enrichment analysis.
We used Monocle3 (v1.3.1) for periderm trajectory analysis, the metadata and variable peaks from the Seurat object of periderm cells were loaded to Monocle3 [50], and then learn_graph function was used to perform trajectory analysis with default parameters.
TF-DORCs regulatory network
DORCs were determined as the above description (prediction of RNA expression by DORC); DORC score matrix was obtained and smoothed using cisTopic (v0.3.1) with the LSI algorithm. TF-DORCs associations was calculated using runFigRGRN in FigR (v1.0.1) with regulation score ≥ 2 retained [32]. The final networks were constructed using igraph, with node and edge attributions formed according to gene expression and chromatin accessibility.
CellOracle analysis
To validate the identified TFs, we performed the in silico perturbation analysis using CellOracle (v0.18.0) [67]. First, we utilized Cicero (v1.20.0) to identify distal cis-regulatory elements [83]. Subsequently, we annotated the co-accessible peaks and filtered out active promoter/enhancer elements. Next, we constructed a cell type-specific GRN for the TF scan, retaining network edges with p value above 0.001 and top 2000 edges. Finally, we set the TF expression as 0 to compute the perturbation scores in different clusters.
CUT&Tag analysis
For the CUT&Tag experiments, dorsal skin was harvested from E13.5 and E18.5 C57BL/6J mouse embryos. Cells from littermates at each developmental stage were pooled to generate homogenized biological samples, which were then processed as two technical replicates (Rep1, Rep2). The skin tissue was digested with 0.25% trypsin (Gibco, 25200072) at 37°C for 20 min, and the resulting cell suspension was filtered to remove debris. Cells were collected by centrifugation (300 g, 5 min), resuspended in skin fibroblast growth medium (DMEM + 10% FBS + 1x NEAA + 1x Glutamine), and cultured at 37°C with 5% CO2. Cells were passaged to P2 and subsequently subjected to CUT&Tag using Anti-RUNX2 antibody (Abcam, EPR22858). CUT&Tag assays were performed according to the scCPA-Tag protocol [84].
Two independent CUT&Tag were conducted for library generation for each group. Raw Fastq data were processed using Trimmomatic (v0.39) [85] to remove adapter sequences, and the reads were then mapped to the mouse (mm10) genome using Bowtie2 (v2.5.4) [86] with default parameters. Peaks were called from the CUT&Tag data using MACS2 (v2.2.9.1) [87] with a p value threshold of 0.01. Bigwig files were generated and visualized using Integrative Genomics Viewer (IGV) [88]. The “reduce” function of the “GenomicRanges” package was used to merge peaks, and overlapping regions of scATAC-seq, CUT&Tag peaks, and RUNX2 motifs were identified. Finally, regulatory events were visualized by overlaying CUT&Tag tracks, scATAC-seq peaks, and TF-target peaks within the RUNX2 target DORCs gene list.
Supplementary Information
Acknowledgements
We thank all our teams’ members and the China National GeneBank (CNGB) for their support.
Peer review information
Zhana Duren and Tim Sands were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.
Authors’ contributions
C.L. and L.L. conceived the idea; C.L. and Y.Y. supervised the study; Y.Y., Q.D. and Y.L. designed the experiment; Q.D. and Y.Y. performed the majority of the experiments with the help of Z.H., J.X., M.C., X.L., R.Z., S.D., J.C., C.W., R.L., X.S., L.W. and Chang L.; P.C. and Z.Z. analyzed the data with the help of W.M., X.C., S.H. and W.M; P.G., J.L., X.Y., X.W. and Jun X. provided project support; Q.D., Y.L. and P.C. wrote the manuscript; C.L. and Y.Y. participated in the manuscript editing and discussion.
Funding
This research was supported by the China Postdoctoral Science Foundation (No. 2023M732365 to P.C., No. 2023M732369 to Y.Y.), National Science and Technology Innovation 2030 Major Program (2021ZD0200100 to L.L.), National Natural Science Foundation of China (32400550 to P.C.), Guangdong Basic and Applied Basic Research Foundation (2024B1515230003 to C.L.), Hangzhou Science and Technology Department (TD2023003 and 2024SZD0128 to Chang L. and C.L.), Fujian Province Medical and Health Senior Talent Team Introduction Project, and Shenzhen Key Laboratory of Single-Cell Omics (ZDSYS20190902093613831).
Data availability
All raw sequencing data generated in this study have been deposited in the CNCB-GSA (Genome Sequence Archive) database under accession numbers CRA025996 (https://ngdc.cncb.ac.cn/gsa/browse/CRA025996) [89], an INSDC-approved repository as required by Genome Biology. Additionally, all datasets are also available in the CNGB Nucleotide Sequence Archive (CNSA) [90] of China National GeneBank DataBase (CNGBdb) [91], with accession number CNP0005787 (https://db.cngb.org/search/project/CNP0005787/) [92].
The external cell line datasets utilized in this study were obtained from multiple publicly available sources. the 10x Multiome and ISSAAC-seq datasets were acquired from ArrayExpress (accession number E-MTAB-11264) [93]. Additional datasets were obtained from the Gene Expression Omnibus (GEO), including: sci-CAR-seq (GSE117089) [94], SNARE-seq2 (GSE157660) [95], Paired-seq (GSE130399) [96], and SHARE-seq (GSE140203) [97]. The mouse brain dataset generated by 10x Multiome was downloaded from 10x Genomics Mouse Brain Nuclei Isolated with Chromium Nuclei Isolation Kit, SaltyEZ Protocol, and 10x Complex Tissue DP (CT Sorted and CT Unsorted) - 10x Genomics [29]. All data were analyzed using standard programs and packages, as detailed above. Source code and analysis scripts supporting the findings of this study are available on the Github repository (https://github.com/caipf/HT-scCAT-seq) [98]. Source code was also uploaded to Zenodo (https://zenodo.org/records/15348528) [99].
Declarations
Ethics approval and consent to participate
All mice experiments were approved by the Institutional Review Board on the Ethics Committee of BGI.
Consent for publication
Not applicable.
Competing interests
Q.D., P.C., Y.L., Z.Z., W.M., Z.H., X.C., S.H., W.M., J.X., C.W., M.C., X.L., R.Z., S.D., J.C., R.L., X.S., Chang.L, L.W., L.L., Y.Y. and C.L. are employees of BGI. The other authors declare that they have no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Qiuting Deng, Pengfei Cai, Yingjie Luo and Zhongjin Zhang contributed equally to this work.
Contributor Information
Longqi Liu, Email: liulongqi@genomics.cn.
Yue Yuan, Email: yuanyue@genomics.cn.
Chuanyu Liu, Email: liuchuanyu@genomics.cn.
References
- 1.Fuchs E. Scratching the surface of skin development. Nature. 2007;445(7130):834–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wu J, Xu J, Liu B, Yao G, Wang P, Lin Z, et al. Chromatin analysis in human early development reveals epigenetic transition during ZGA. Nature. 2018;557(7704):256–60. [DOI] [PubMed] [Google Scholar]
- 3.Forni MF, Trombetta-Lima M, Sogayar MC. Stem cells in embryonic skin development. Biol Res. 2012;45(3):215–22. [DOI] [PubMed] [Google Scholar]
- 4.Sotiropoulou PA, Blanpain C. Development and homeostasis of the skin epidermis. Cold Spring Harb Perspect Biol. 2012;4(7): a008383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Liu S, Zhang H, Duan E. Epidermal development in mammals: key regulators, signals from beneath, and stem cells. Int J Mol Sci. 2013;14(6):10869–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Preissl S, Gaulton KJ, Ren B. Characterizing cis-regulatory elements using single-cell epigenomics. Nat Rev Genet. 2023;24(1):21–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jacob T, Annusver K, Czarnewski P, Dalessandri T, Kalk C, Levron CL, et al. Molecular and spatial landmarks of early mouse skin development. Dev Cell. 2023;58(20):2140–62 e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gupta K, Levinsohn J, Linderman G, Chen D, Sun TY, Dong D, et al. Single-cell analysis reveals a hair follicle dermal niche molecular differentiation trajectory that begins prior to morphogenesis. Dev Cell. 2019;48(1):17–31 e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Fan X, Wang D, Burgmaier JE, Teng Y, Romano R-A, Sinha S, et al. Single cell and open chromatin analysis reveals molecular origin of epidermal cells of the skin. Dev Cell. 2018;47(1):21–37 e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ge W, Tan S-J, Wang S-H, Li L, Sun X-F, Shen W, et al. Single-cell transcriptome profiling reveals dermal and epithelial cell fate decisions during embryonic hair follicle development. Theranostics. 2020;10(17):7581. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Hammond NL, Dixon J, Dixon MJ. Periderm: life-cycle and function during orofacial and epidermal development. Semin Cell Dev Biol. 2019;91:75–83. [DOI] [PubMed] [Google Scholar]
- 12.Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan CH, et al. Inference and analysis of cell-cell communication using Cell Chat. Nat Commun. 2021;12(1):1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wu J, Huang B, Chen H, Yin Q, Liu Y, Xiang Y, et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature. 2016;534(7609):652–7. [DOI] [PubMed] [Google Scholar]
- 14.Sulic A-M, Roy RD, Papagno V, Lan Q, Saikkonen R, Jernvall J, et al. Transcriptomic landscape of early hair follicle and epidermal development. Cell Reports. 2023;42(6):112643. [DOI] [PubMed]
- 15.Lee H, Kim SY, Kwon N-J, Jo SJ, Kwon O, Kim J-I. Single-cell and spatial transcriptome analysis of dermal fibroblast development in perinatal mouse skin: dynamic lineage differentiation and key driver genes. J InvestDermatol. 2024;144(6):1238–50 e11. [DOI] [PubMed] [Google Scholar]
- 16.Ma S, Zhang B, LaFave LM, Earl AS, Chiang Z, Hu Y, et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell. 2020;183(4):1103–16 e20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Liu L, Liu C, Quintero A, Wu L, Yuan Y, Wang M, et al. Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat Commun. 2019;10(1):470. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Xing QR, El Farran CA, Zeng YY, Yi Y, Warrier T, Gautam P, et al. Parallel bimodal single-cell sequencing of transcriptome and chromatin accessibility. Genome Res. 2020;30(7):1027–39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Wu H, Li X, Jian F, Yisimayi A, Zheng Y, Tan L, et al. Highly sensitive single-cell chromatin accessibility assay and transcriptome coassay with METATAC. Proc Natl Acad Sci. 2022;119(40): e2206450119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Liu C, Wu T, Fan F, Liu Y, Wu L, Junkin M, et al. A portable and cost-effective microfluidic system for massively parallel single-cell transcriptome profiling. bioRxiv. 2019. Preprint at https://www.biorxiv.org/content/10.1101/818450v1
- 21.Plongthongkum N, Diep D, Chen S, Lake BB, Zhang K. Scalable dual-omics profiling with single-nucleus chromatin accessibility and mRNA expression sequencing 2 (SNARE-seq2). Nat Protoc. 2021;16(11):4992–5029. [DOI] [PubMed] [Google Scholar]
- 22.Zhang H, Song L, Wang X, Cheng H, Wang C, Meyer CA, et al. Fast alignment and preprocessing of chromatin profiles with Chromap. Nat Commun. 2021;12(1):6566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lareau CA, Ma S, Duarte FM, Buenrostro JD. Inference and effects of barcode multiplets in droplet-based single-cell assays. Nat Commun. 2020;11(1):866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shi Q, Liu S, Kristiansen K, Liu L. The FASTQ+ format and PISA. Bioinformatics. 2022;38(19):4639–42. [DOI] [PubMed] [Google Scholar]
- 25.Vitak SA, Torkenczy KA, Rosenkrantz JL, Fields AJ, Christiansen L, Wong MH, et al. Sequencing thousands of single-cell genomes with combinatorial indexing. Nat Methods. 2017;14(3):302–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Xu W, Yang W, Zhang Y, Chen Y, Hong N, Zhang Q, et al. ISSAAC-seq enables sensitive and flexible multimodal profiling of chromatin accessibility and gene expression in single cells. Nat Methods. 2022;19(10):1243–9. [DOI] [PubMed] [Google Scholar]
- 27.Zhu C, Yu M, Huang H, Juric I, Abnousi A, Hu R, et al. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat Struct Mol Biol. 2019;26(11):1063–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cao J, Cusanovich DA, Ramani V, Aghamirzaie D, Pliner HA, Hill AJ, et al. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science. 2018;361(6409):1380–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Mouse brain nuclei isolated with Chromium Nuclei Isolation Kit, SaltyEZ Protocol, and 10x Complex Tissue DP (CT sorted and CT unsorted). 2023. 10x genomics. https://www.10xgenomics.com/datasets/mouse-brain-nuclei-isolated-with-chromium-nuclei-isolation-kit-saltyez-protocol-and-10x-complex-tissue-dp-ct-sorted-and-ct-unsorted-1-standard.
- 30.Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37(1):38–44. [DOI] [PubMed] [Google Scholar]
- 31.Tasic B, Yao Z, Graybuck LT, Smith KA, Nguyen TN, Bertagnolli D, et al. Shared and distinct transcriptomic cell types across neocortical areas. Nature. 2018;563(7729):72–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kartha VK, Duarte FM, Hu Y, Ma S, Chew JG, Lareau CA, et al. Functional inference of gene regulation using single-cell multi-omics. Cell genomics. 2022;2(9):100166. [DOI] [PMC free article] [PubMed]
- 33.Stuart T, Srivastava A, Madad S, Lareau CA, Satija R. Single-cell chromatin state analysis with Signac. Nat Methods. 2021;18(11):1333–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Mitra S, Malik R, Wong W, Rahman A, Hartemink AJ, Pritykin Y, et al. Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis. Nat Genet. 2024;56(4):627–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Martinetti LE, Autio DM, Crandall SR. Motor control of distinct layer 6 corticothalamic feedback circuits. 2024. Preprint at https://www.biorxiv.org/content/10.1101/2024.02.28.582650v1 [DOI] [PMC free article] [PubMed]
- 36.Guo W, Clause AR, Barth-Maron A, Polley DB. A corticothalamic circuit for dynamic switching between feature detection and discrimination. Neuron. 2017;95(1):180–94 e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, et al. Comprehensive integration of single-cell data. Cell. 2019;177(7):1888–902 e21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Mok K-W, Saxena N, Heitman N, Grisanti L, Srivastava D, Muraro MJ, et al. Dermal condensate niche fate specification occurs prior to formation and is placode progenitor dependent. Dev Cell. 2019;48(1):32–48 e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.DeTomaso D, Yosef N. Hotspot identifies informative gene modules across modalities of single-cell genomics. Cell Syst. 2021;12(5):446–56 e9. [DOI] [PubMed] [Google Scholar]
- 40.Fulco CP, Nasser J, Jones TR, Munson G, Bergman DT, Subramanian V, et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat Genet. 2019;51(12):1664–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Osterwalder M, Barozzi I, Tissières V, Fukuda-Yuzawa Y, Mannion BJ, Afzal SY, et al. Enhancer redundancy provides phenotypic robustness in mammalian development. Nature. 2018;554(7691):239–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Rangarajan A, Talora C, Okuyama R, Nicolas M, Mammucari C, Oh H, et al. Notch signaling is a direct determinant of keratinocyte growth arrest and entry into differentiation. EMBO J. 2001;20(13):3427–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Uyttendaele H, Panteleyev AA, De Berker D, Tobin DT, Christiano AM. Activation of Notch1 in the hair follicle leads to cell-fate switch and Mohawk alopecia. Differentiation. 2004;72(8):396–409. [DOI] [PubMed] [Google Scholar]
- 44.McGowan KM, Coulombe PA. Onset of keratin 17 expression coincides with the definition of major epithelial lineages during skin development. J Cell Biol. 1998;143(2):469–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods. 2017;14(10):975–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Jiang Y, Harigaya Y, Zhang Z, Zhang H, Zang C, Zhang NR. Nonparametric single-cell multiomic characterization of trio relationships between transcription factors, target genes, and cis-regulatory regions. Cell Syst. 2022;13(9):737–51 e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Guerrero-Juarez CF, Dedhia PH, Jin S, Ruiz-Vega R, Ma D, Liu Y, et al. Single-cell analysis reveals fibroblast heterogeneity and myeloid-derived adipocyte progenitors in murine skin wounds. Nat Commun. 2019;10(1):650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kim JY, Park M, Ohn J, Seong RH, Chung JH, Kim KH, et al. Twist2-driven chromatin remodeling governs the postnatal maturation of dermal fibroblasts. Cell Rep. 2022;39(7): 110821. [DOI] [PubMed] [Google Scholar]
- 49.Šošić D, Richardson JA, Yu K, Ornitz DM, Olson EN. Twist regulates cytokine gene expression through a negative feedback loop that represses NF-κB activity. Cell. 2003;112(2):169–80. [DOI] [PubMed] [Google Scholar]
- 50.Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566(7745):496–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, et al. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol. 2010;28(5):495–501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Richardson RJ, Hammond NL, Coulombe PA, Saloranta C, Nousiainen HO, Salonen R, et al. Periderm prevents pathological epithelial adhesions during embryogenesis. J Clin Invest. 2014;124(9):3891–900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Cui CY, Kunisada M, Esibizione D, Grivennikov SI, Piao Y, Nedospasov SA, et al. Lymphotoxin-beta regulates periderm differentiation during embryonic skin development. Hum Mol Genet. 2007;16(21):2583–90. [DOI] [PubMed] [Google Scholar]
- 54.Usansky I, Jaworska P, Asti L, Kenny FN, Hobbs C, Sofra V, et al. A developmental basis for the anatomical diversity of dermis in homeostasis and wound repair. J Pathol. 2021;253(3):315–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Shaw TJ, Rognoni E. Dissecting fibroblast heterogeneity in health and fibrotic disease. Curr Rheumatol Rep. 2020;22:1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Rinkevich Y, Walmsley GG, Hu MS, Maan ZN, Newman AM, Drukker M, et al. Identification and isolation of a dermal lineage with intrinsic fibrogenic potential. Science. 2015;348(6232):aaa2151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Kalluri R, Zeisberg M. Fibroblasts in cancer. Nat Rev Cancer. 2006;6(5):392–401. [DOI] [PubMed] [Google Scholar]
- 58.Dulauroy S, Di Carlo SE, Langa F, Eberl G, Peduto L. Lineage tracing and genetic ablation of ADAM12+ perivascular cells identify a major source of profibrotic cells during acute tissue injury. Nat Med. 2012;18(8):1262–70. [DOI] [PubMed] [Google Scholar]
- 59.La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, et al. RNA velocity of single cells. Nature. 2018;560(7719):494–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lange M, Bergen V, Klein M, Setty M, Reuter B, Bakhti M, et al. Cell Rank for directed single-cell fate mapping. Nat Methods. 2022;19(2):159–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Setty M, Kiseliovas V, Levine J, Gayoso A, Mazutis L, Pe’Er D. Characterization of cell fate probabilities in single-cell data with Palantir. Nat Biotechnol. 2019;37(4):451–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Tsai S-Y, Sennett R, Rezza A, Clavel C, Grisanti L, Zemla R, et al. Wnt/β-catenin signaling in dermal condensates is required for hair follicle formation. Dev Biol. 2014;385(2):179–88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.DasGupta R, Fuchs E. Multiple roles for activated LEF/TCF transcription complexes during hair follicle development and differentiation. Development. 1999;126(20):4557–68. [DOI] [PubMed] [Google Scholar]
- 64.Glotzer DJ, Zelzer E, Olsen BR. Impaired skin and hair follicle development in Runx2 deficient mice. Dev Biol. 2008;315(2):459–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Lan Y, Wu Z, Liu H, Jiang R. Lineage‐specific requirements of Alx4 function in craniofacial and hair development. Dev Dyn. 2024; 253(10):940-948. [DOI] [PMC free article] [PubMed]
- 66.Clavel C, Grisanti L, Zemla R, Rezza A, Barros R, Sennett R, et al. Sox2 in the dermal papilla niche controls hair growth by fine-tuning BMP signaling in differentiating hair shaft progenitors. Dev Cell. 2012;23(5):981–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Kamimoto K, Stringa B, Hoffmann CM, Jindal K, Solnica-Krezel L, Morris SA. Dissecting cell identity via network inference and in silico gene perturbation. Nature. 2023;614(7949):742–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature. 2015;523(7561):486–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Tan L, Xing D, Chang C-H, Li H, Xie XS. Three-dimensional genome structures of single diploid human cells. Science. 2018;361(6405):924–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Shema E, Bernstein BE, Buenrostro JD. Single-cell and single-molecule epigenomics to uncover genome regulation at unprecedented resolution. Nat Genet. 2019;51(1):19–25. [DOI] [PubMed] [Google Scholar]
- 71.Kelsey G, Stegle O, Reik W. Single-cell epigenomics: recording the past and predicting the future. Science. 2017;358(6359):69–75. [DOI] [PubMed] [Google Scholar]
- 72.Chen X, Litzenburger UM, Wei Y, Schep AN, LaGory EL, Choudhry H, et al. Joint single-cell DNA accessibility and protein epitope profiling reveals environmental regulation of epigenomic heterogeneity. Nat Commun. 2018;9(1):4590. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods. 2017;14(10):959–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Kaminow B, Yunusov D, Dobin A. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. Biorxiv. 2021:2021.05. 05.442755.
- 76.Granja JM, Corces MR, Pierce SE, Bagdatli ST, Choudhry H, Chang HY, et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet. 2021;53(3):403–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Germain P-L, Lun A, Meixide CG, Macnair W, Robinson MD. Doublet identification in single-cell sequencing data using scDblFinder. F1000Research. 2021;10. [DOI] [PMC free article] [PubMed]
- 78.Young MD, Behjati S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. Gigascience. 2020;9(12):giaa151. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Fornes O, Castro-Mondragon JA, Khan A, Van der Lee R, Zhang X, Richmond PA, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2020;48(D1):D87–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. The innovation. 2021;2(3):100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Svensson V, Pachter L. RNA velocity: molecular kinetics from single-cell RNA-Seq. Mol Cell. 2018;72(1):7–9. [DOI] [PubMed] [Google Scholar]
- 82.Bergen V, Lange M, Peidli S, Wolf FA, Theis FJ. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol. 2020;38(12):1408–14. [DOI] [PubMed] [Google Scholar]
- 83.Pliner HA, Packer JS, McFaline-Figueroa JL, Cusanovich DA, Daza RM, Aghamirzaie D, et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Molecular cell. 2018;71(5):858–71 e8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Wang C, Huang W, Zhong Y, Zou X, Liu S, Li J, et al. Single-cell multi-modal chromatin profiles revealing epigenetic regulations of cells in hepatocellular carcinoma. Clin Transl Med. 2024;14(9): e70000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Langmead B, Wilks C, Antonescu V, Charles R. Scaling read aligners to hundreds of threads on general-purpose processors. Bioinformatics. 2019;35(3):421–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Qiuting Deng. Scalable single-cell multi-omics reveals gene regulatory programs for mouse skin development. 2025. Genome Sequence Archive. https://ngdc.cncb.ac.cn/gsa/s/SjCdJApG.
- 90.Guo X, Chen F, Gao F, Li L, Liu K, You L, et al. CNSA: a data repository for archiving omics data. Database. 2020;2020:baaa055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Chen FZ, You LJ, Yang F, Wang LN, Guo XQ, Gao F, et al. CNGBdb: China National GeneBank DataBase. Yi chuan= Hereditas. 2020;42(8):799–809. [DOI] [PubMed] [Google Scholar]
- 92.Qiuting Deng. Scalable single-cell multi-omics reveals gene regulatory programs for mouse skin developmen. 2024. China National GeneBank DataBase. https://db.cngb.org/search/project/CNP0005787/.
- 93.Xi Chen. Single cell multi-omics profiling using ISSAAC-seq. 2022. BioStudies, E-MTAB-11264. https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-11264.
- 94.Junyue Cao. Joint profiling of chromatin accessibility and gene expression in thousands of single cells. 2018. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE117089. [DOI] [PMC free article] [PubMed]
- 95.Kun Zhang. Scalable dual-omic profiling with single-nucleus chromatin accessibility and mRNA expression sequencing 2 (SNARE-seq2). 2020. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157660. [DOI] [PubMed]
- 96.Zhu Chenxu. An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. 2019. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130399. [DOI] [PMC free article] [PubMed]
- 97.Sai Ma. Integrative single-cell chromatin and transcriptome profiling uncovers cell-type specific regulatory interactions. 2020. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140203.
- 98.Pengfei Cai, Zhongjin Zhang. High throughput single-cell chromatin accessibility and transcriptome sequencing (HT-scCAT-seq). 2025. GitHub. https://github.com/caipf/HT-scCAT-seq.
- 99.Zhang Z. High throughput single-cell chromatin accessibility and transcriptome sequencing (HT-scCAT-seq). 2025. Zenodo. 10.5281/zenodo.15348528.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All raw sequencing data generated in this study have been deposited in the CNCB-GSA (Genome Sequence Archive) database under accession numbers CRA025996 (https://ngdc.cncb.ac.cn/gsa/browse/CRA025996) [89], an INSDC-approved repository as required by Genome Biology. Additionally, all datasets are also available in the CNGB Nucleotide Sequence Archive (CNSA) [90] of China National GeneBank DataBase (CNGBdb) [91], with accession number CNP0005787 (https://db.cngb.org/search/project/CNP0005787/) [92].
The external cell line datasets utilized in this study were obtained from multiple publicly available sources. the 10x Multiome and ISSAAC-seq datasets were acquired from ArrayExpress (accession number E-MTAB-11264) [93]. Additional datasets were obtained from the Gene Expression Omnibus (GEO), including: sci-CAR-seq (GSE117089) [94], SNARE-seq2 (GSE157660) [95], Paired-seq (GSE130399) [96], and SHARE-seq (GSE140203) [97]. The mouse brain dataset generated by 10x Multiome was downloaded from 10x Genomics Mouse Brain Nuclei Isolated with Chromium Nuclei Isolation Kit, SaltyEZ Protocol, and 10x Complex Tissue DP (CT Sorted and CT Unsorted) - 10x Genomics [29]. All data were analyzed using standard programs and packages, as detailed above. Source code and analysis scripts supporting the findings of this study are available on the Github repository (https://github.com/caipf/HT-scCAT-seq) [98]. Source code was also uploaded to Zenodo (https://zenodo.org/records/15348528) [99].