Abstract
Enhancers play an essential role in developmental processes by regulating the spatiotemporal expression of genes. Characterizing their spatiotemporal activity remains however an important challenge. Here we introduce a novel in vivo/in silico method for spatial single-cell enhancer-reporter assays (spatial-scERA) designed to reconstruct the spatial activity of candidate enhancer regions in parallel in multicellular organisms. Spatial-scERA integrates parallel reporter assays with single-cell RNA sequencing and spatial reconstruction using optimal transport, to map cell-type-specific enhancer activity at the single-cell level on a 3D virtual sample. We evaluated spatial-scERA in Drosophila embryos using 25 candidate enhancers, and validated the robustness of our reconstructions by comparing them to in situ hybridization. Remarkably, spatial-scERA faithfully reconstructed the spatial activity of these enhancers, even when the reporter construct was expressed in as few as 10 cells. Our results demonstrate the importance of integrating transcriptomic and spatial data for accurately predicting enhancer activity patterns in complex multicellular samples and linking enhancers to their potential target genes. Overall, spatial-scERA provides a scalable approach to map the spatial activity of enhancers at single-cell resolution without the need for imaging or a priori knowledge of embryology and can be applied to any multicellular organism amenable to transgenesis.
Graphical Abstract
Graphical Abstract.
Introduction
In metazoans, development is a highly regulated process that transforms a single cell to a fully-formed adult organism. This transformation occurs through differentiation processes largely driven by the regulation of gene expression. A key mechanism in this regulation involves noncoding DNA elements known as cis-regulatory modules (CRM) such as enhancers, which confer to genes their specific spatial and temporal patterns of expression [1–4]. A major challenge in the field remains the comprehensive identification of enhancer sequences and the detailed characterization of their spatiotemporal activity. Although a growing number of enhancers have been functionally characterized in model organisms [5–9], typically through enhancer-reporter assays [3, 10, 11], the sheer number of putative developmental enhancers far exceeds the number of genes expressed during development. As a result, we are still far from having a complete understanding of the mechanisms by which enhancers regulate gene expression.
Thanks to substantial efforts involving the mapping of DNase I hypersensitive sites, histone modifications, and transcription factor binding sites, putative enhancers are being discovered at a rapid pace [12–15]. The advent of massively parallel reporter assays (MPRA) has further accelerated this process, enabling the testing of thousands of such enhancer candidates in cell culture systems [16–18]. However, these approaches are limited in their ability to provide information about the actual spatiotemporal activity of enhancers in multicellular organisms. Traditional in vivo reporter assays, while capable of systematically testing enhancer activity, have low throughput, as each candidate must be tested individually. Although recent attempts to scale up these assays are promising, they are often limited to a few cell types [19], rely on episomal vectors that lack genomic context [20], or require specialized equipment and extensive prior knowledge of the species’ embryology [5, 21].
The advent of single-cell sequencing technologies has paved the way for high-throughput enhancer identification at the cellular level in multicellular organisms. For example, recent profiling of single-cell chromatin accessibility during Drosophila embryogenesis has revealed over 100 000 open chromatin regions with potential enhancer function at specific developmental stages [22]. However, chromatin accessibility remains only a predictive tool for enhancer function and does not always correlate with spatiotemporal activity [23]. By combining the efficiency of classical reporter assays with the cellular resolution of single-cell RNA sequencing (scRNA-seq), it is now possible to enhance these predictions, addressing both the throughput limitations of traditional reporter assays and the lack of in vivo insights provided by MPRAs. Nevertheless, single-cell reporter assays have so far been limited to cell culture systems [24, 25], hence still lacking spatiotemporal resolution. Ultimately, none of these methods offers an unbiased, scalable means of characterizing spatiotemporal enhancer activity at single-cell resolution.
Here, we addressed this limitation by developing spatial-scERA, an in vivo single-cell reporter assay that predicts the tissue-specific activity of candidate enhancer regions in a multicellular organism. Our key innovation is the combination of scRNA-seq with spatialization based on optimal transport to reconstruct the activity of enhancers in a virtual embryo without the need for imaging [26, 27]. We applied this method to 25 candidate enhancers in stage 6 Drosophila embryos and demonstrated the robustness of our method by comparing our reconstructions with traditional enhancer-reporter assays coupled with imaging. Our results highlight the importance of spatial data integration to reconstruct the activity of enhancers in complex tissues and predict the enhancer’s target genes.
Materials and methods
Candidate enhancer selection
The list of candidate enhancers was established by selecting noncoding DNA regions located in the vicinity (10 kb up and downstream) of tissue-specific genes. The list of tissue-specific genes was generated using a scRNA-seq dataset from stage 6 Drosophila embryos [28]. The expression matrix, consisting of 1297 cells, was analyzed in R using the Seurat package (v4.4.0) [29]. Highly variable genes were selected using the mean.var.plot method. Scaling and Principal Component Analysis (PCA) were performed with default settings. The first 16 principal components were retained based on the knee observed with the ElbowPlot function. The RunUMAP function was used for dimensionality reduction and DimPlot for visualization. Clustering with FindClusters (resolution 0.5) resulted in six clusters. The 10 most differentially expressed genes for each cluster were identified using findAllMarkers, and formed the list of tissue-specific genes. Cluster identities were determined by retrieving the cell-type annotation of these genes from FlyBase [30] and the Berkeley Drosophila Genome Project (BDGP) in situ database [31].
These genomic loci were then visually inspected in a genome browser for the presence of the H3K4me1 and H3K27ac histone modifications based on ChIP-seq datasets generated in whole embryos at 0–4 h after egg lay [32, 33] and for the presence of open chromatin based on DNase I hypersensitivity in stage 5 whole embryos [15]. We also verified that these regions were devoid of RNA-seq signal using a whole embryo dataset at 2–4 h after egg lay [34]. This resulted in 111 regions which we further narrowed down to 19 candidates by selecting the ones displaying higher level of DNase I hypersensitivity and/or histone modification and by cross-referencing them to cis-regulatory information available in the RedFly database [35]. Six additional regions of interest to the team were added to this set, resulting in a total of 25 candidate enhancers. The exact coordinates of the regions were set to be centered around the DNase I hypersensitivity peak, with each region having an approximate length of 1000 bp (Supplementary Table 1).
All tracks were plotted using pyGenomeTracks (v3.8) [36].
Plasmid library preparation
All plasmids were constructed using standard cloning methods with reagents from New England Biolabs, including restriction enzymes, T4 DNA ligase or the NEBuilder HiFi DNA Assembly kit. All constructs were verified by Sanger sequencing.
The pBID-mphsp70-kozak-CD2-CmR/ccdB-25A-SV40polyA reporter plasmid (Supplementary Fig. S14) was generated from the pBID backbone vector (Addgene #35190 [37]) containing the mini-white gene as an integration marker. The cassette containing the CmR and ccdB genes was amplified by polymerase chain reaction (PCR) from the pSTARR-seq_fly-hsp70 plasmid (Addgene #71500 [18]). The other DNA fragments required for the final construct were synthesized by GeneArt (Life Technologies) and PCR amplified. All fragments were assembled into the pBID backbone. The resulting plasmid was propagated in ccdB survival bacteria (A10460, Life Technologies). A complete map of the final reporter vector is shown in Supplementary Fig. S14. The reporter includes the minimal hsp70 promoter, a kozak sequence, the CD2 gene, the CmR/ccdB cassette followed by the pGL3’s SV40 late polyA signal. We also include a stretch of 25 adenosine upstream the polyA signal to avoid internal priming during the scRNA-seq library preparation [38] and ensure that most reads arise from the 3′ end of the transcript.
To prepare the library of 25 reporter plasmids, the candidate enhancers were amplified from genomic DNA of the w[1118]; PBac{y[+mDint2]=vas-Cas9}VK00027 (BDSC_51 324) fly line. A 19 bp common sequence and a 6 bp specific barcode were added by PCR at the 3′ end of each candidate enhancers. The primers used for PCR are listed in Supplementary Table 4.
The 25 PCR reactions were gel purified and the candidate enhancer fragments diluted to 0.045 pmol/μl in a final volume of 8 μl. The fragments were mixed in equimolar ratio and inserted in batch in the pBID-mphsp70-kozak-CD2-CmR/ccdB-25A-SV40polyA plasmid, instead of the CmR/ccdB cassette, using AgeI and NotI restriction sites and the NEBuilder HiFi DNA Assembly kit. The vectors were transformed using Omni-max bacteria (C854003, Life Technologies), and directly transferred to 100 ml of LB medium containing ampicillin. The plasmid library was extracted using the NucleoBond Xtra Midi kit (740410.50, Macherey-Nagel).
Injection and fly handling
The pool of 25 reporter vectors was injected in-house through PhiC31-mediated recombination [39]. The injections were performed using a white-eyed fly line expressing the PhiC31 integrase and displaying a unique attP landing site on chromosome 2L (nos- φC31/int.NLS; attP40 [40]). A total of 3 250 Drosophila embryos were injected with the reporter plasmid library. The resulting flies were crossed with the nos- φC31/int.NLS; attP40 line. The 285 red-eyed transgenic progenitors were allowed to mate between themselves. The resulting flies are homozygous for the reporter construct but may contain a different candidate enhancer on each allele. These flies were amplified for three generations and used as a pool for single-cell experiments. We also extracted genomic DNA from a pool of transgenic progenitors to verify the correct genomic integration of the all 25 candidate enhancers by PCR.
In parallel, we also derived homozygous lines carrying the same enhancer construct on both alleles for each candidate enhancer for further Reverse Transcription quantitative Polymerase Chain Reaction (RT-qPCR) and in situ hybridization experiments.
Single-cell RNA sequencing
Freshly hatched adults from the pool of transgenic flies containing our library of candidate enhancers were combined in five embryo collection vials with standard apple cap plates. After three 45-min pre-lays, Drosophila embryos were collected on apple juice plates for 1-h collection and incubated for another 2.5 h at 25°C. We verified that the collected embryos mostly span developmental stages 5–7. The embryos were dechorionated using 2.6% bleach for 2 min, washed with water and PBS + 0.1% Triton X-100, finally resuspended in 1 ml of ice-cold PBS + 0.1% Triton X-100 and kept on ice.
Embryos were washed with ice-cold PBS, resuspended in 10 ml of dissociation buffer (PBS 1× + 0.1% bovine serum albumin) and dissociated on ice in a Dounce homogenizer with gentle strokes of the loose pestle. This process was repeated for all collected embryos, using a small number of embryos each time. The cell suspension was transferred to 15 ml Falcon tubes and centrifuged for 10 min at 40 × g at 4°C to pellet debris. Cells in the supernatant were transferred to new Falcon tubes and centrifuged again for 10 min at 800 × g at 4°C. The supernatant was discarded, and cell pellets were combined and resuspended in 0.5 ml of dissociation buffer. Two additional rounds of centrifugation, each for 5 min at 800 × g at 4°C were performed to remove as much debris as possible.
Cells were counted using a Malassez counting chamber and the concentration was adjusted to 900 cells/μl in the dissociation buffer. scRNA-seq libraries were prepared using the Chromium Single Cell 3′ v3.1 protocol (10× Genomics), aiming for 10 000 cells per sample. Single cells were encapsulated into droplets in the Chromium Controller instrument for cell lysis and barcoded reverse transcription of mRNA. 40 μl of complementary DNA (cDNA) were recovered, and 10 μl (25%) were used for amplification, fragmentation, and Illumina library construction. The libraries were multiplexed and sequenced on a NovaSeq sequencer (Illumina) using 150-bp paired-end reads, yielding 500 million reads per library.
Targeted PCR sequencing
From the remaining 30 μl of cDNA generated during the scRNA-seq library preparation, 300 ng were used for targeted PCR amplification. The region containing the specific enhancer barcode and the 10× Genomics barcodes (cellular barcode and UMI) was amplified with the Q5 Hot Start High Fidelity polymerase (New England Biolabs). The forward primer binds to the 19 nt sequence common to all constructs (GACGTCATCGTCCTGCAGG) and the reverse primer to the TruSeqRead1 sequence added during the scRNA-seq library preparation (CTACACGACGCTCTTCCGATC). The cycling conditions were as follows: Initial denaturation at 98°C for 10 s; then 25 cycles at 98°C for 10 s, 68°C for 30 s, 72°C for 20 s; then an elongation step at 72°C for 5 min.
The PCR product was purified using SPRIselect beads (Beckman Coulter) and 100 ng used to generate the final libraries using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs). The libraries were indexed for multiplexing using NEBNext multiplex oligos kit for Illumina (New England Biolabs) and sequenced on a NovaSeq sequencer (Illumina) using 150-bp paired-end reads, yielding 10 million reads per sample.
RT-qPCR
Freshly hatched adult flies of the appropriate genotype were placed in embryo collection cages with apple juice agar plates supplemented with yeast paste. After three pre-lay periods of 45 min for stage-specific collections, Drosophila embryos were collected for 1 h and incubated for another 2.5 h at 25°C until they reached stages 5–7.
The embryos used for RT-qPCR were directly transferred to the RA1 buffer supplemented with 2-mercaptoethanol provided with the NucleoSpin RNA kit (Macherey-Nagel) and stored at −80°C for further RNA extraction.
RNA extraction was performed by grinding the embryos with a pestle in the RA1 buffer, followed by RNA purification using the Nucleospin RNA kit (Macherey-Nagel). Reverse transcription of 1 μg of RNA was performed using the RevertAid First Strand cDNA Synthesis Kit (Life Technologies) with random primers. Quantitative Polymerase Chain Reaction (qPCR) was performed with PowerUp™ SYBR™ Green Master Mix (Life Technologies) for three independent biological replicates using the following primers:
RpL32: ATGCTAAGCTGTCGCACAAATG and GTTCGATCCGTAACCGATGT
CD2: CCTGAGAGCACCGTTTAAGT and AGATAGAGGGGCAGACCTTT
Relative quantification was done using the 2−ΔΔCt formula.
Hybridization Chain Reaction (HCR) RNA fluorescent in situ hybridization in Drosophila embryos
Freshly hatched adult flies of the appropriate genotype were placed in embryo collection cages with apple juice agar plates supplemented with yeast. After three pre-lay periods of 45 min for stage-specific collections, Drosophila embryos were collected for 1 h, and incubated for another 2.5 h at 25°C until they reached stages 5–7.
HCR RNA fluorescent in situ hybridization was performed according to the manufacturer instructions (Molecular Instruments) for whole-mount fruit fly embryos. All HCR probes, amplifiers, and buffers were purchased from Molecular Instruments. Two different probe sets were designed; one against the CD2 mRNA used with amplifier B3-488 to image spatial enhancer activity and one against twist mRNA used with amplifier B1-546 to help in embryo staging.
Embryos were mounted in ProLong Gold antifade reagent with DAPI (P36935, Life Technologies) and imaged on a Leica SP8 confocal microscope using a 20 × glycerol objective. For each genotype, around 10 embryos were imaged and several Z-stacks were acquired (section thickness of 1.5 μm).
Generation of a custom reference genome
Manual modifications were made to the Drosophila melanogaster genome from NCBI (GCF_000001215.4 RefSeq assembly) to integrate the reporter construct and enhancer sequences. We added extra chromosomes that contain the plasmid sequence and the sequence of each of the 25 enhancers (including the barcode). The plasmid was designated as chr5, and enhancer sequences were labeled chr6–chr30, each with its own chromosome. A 100-bp flanking sequence was added to each enhancer on either side of the insertion site within the plasmid vector to improve the ability to map reads to the candidate enhancers.
The annotation file was manually edited to include an mt tag for genes on the mitochondrial chromosome. The gene_id and gene_name columns were swapped to prioritize gene_name during Seurat analysis. All features from the plasmid and enhancer chromosomes were categorized as exon in the gtf file to allow CellRanger to map reads at these loci.
The modified fasta genome and gtf annotation files were used as inputs for the cellranger mkref function to generate an index for mapping the fastq files.
scRNA-seq analysis
Sequencing reads were controlled first by fastQC [41] to assure their good quality before mapping. They were aligned to the custom genome index using cellranger count (10× Genomics Cell Ranger v7.2.0) with 14 processing cores. The 10× Genomics cell and UMI barcodes were isolated by selecting the first 28 nucleotides in R1 with the parameter r1-length set to 28. After mapping, 24 886, 23 125, and 23 437 cells were obtained for replicates 1–3, respectively.
Each gene expression matrix went through stringent quality filtering. Nonviable cells, debris, or empty droplets were excluded if they contained fewer than 500 expressed genes and/or 2000 UMIs. To prevent duplicates, a maximum threshold of 6000 genes and/or 50 000 UMIs was set. Cells with >15% of mitochondrial genes or >36% of ribosomal genes were excluded. After quality control, 1934, 1963, and 2691 cells were retained for each replicate.
To merge all three replicates, the Seurat V4 [29] integration vignette was followed. Datasets were normalized separately, and highly variable genes were identified using the FindVariableFeatures function with the mvp method. Both FindIntegrationAnchors and IntegrateData functions were used to generate a single dataset of 6588 cells.
The standard Seurat pipeline was followed for analysis, including scaling, PCA, neighbor graph generation, clustering, and Uniform Manifold Approximation and Projection (UMAP) visualization. The first 30 components were selected for the PCA, and a clustering resolution of 0.5 yielded 16 clusters. Marker genes for each cluster were identified using FindAllMarkers with default parameters. To determine cell types, the top 10 genes with the greatest differential expression were analyzed using FlyBase [30] and the BDGP in situ database [31]. The most dominant annotation for the 10 genes was selected as the cluster annotation. If two clusters ended up with the same annotation, they were merged in one single cluster.
The reduced clustering was made by removing cells from the “unknown”, “yolk”, and “hemocytes” clusters using the subset function. The remaining 5475 cells underwent the same analysis pipeline, resulting in 16 clusters, with 9 tissues visualized in two dimensions using UMAP.
DimPlots, FeaturePlots, and Clustered DotPlots were generated with the scCustomize R library (v2.1.2) to enhance Seurat visualization functions [42].
Targeted PCR analysis
Cutadapt (v4.5) [43] was used to trim PCR-specific reads with a three-step script. Each step applied two trimming parameters: a minimal overlap of 10 nucleotides (-O 10) and removal of both reads if neither meet criteria (–pair-filter = any). Cutadapt allows a default error rate of 1 in 10 nucleotides.
Step one involved removing the 19 bp common sequence (GACGTCATCGTCCTGCAGG) from forward reads and the 10× adaptor sequence (CTACACGACGCTCTTCCGATCT) from reverse reads. For the second step, both sequences were also checked on the opposite strand. All reads passing any of these two steps were merged into one forward (R1) and one reverse (R2) file. The final trimming removed the second common sequence (GCTGCCGCTTCGAGCAGACATGCATATG) and everything downstream, ensuring R1 reads contained only the 6-nucleotide enhancer barcode, while R2 reads retained only the first 28 nucleotides with the 10× cell + UMI barcodes.
In the specific case where the 10× adaptor sequence had been removed by Illumina’s trimming software, we used seqkit (v2.6.1) [44] to scan for 20 thymines within the 29th–60th nucleotides, which is the region where the polyT sequence can be found if the adaptor was absent. For every reverse read in this case, if the forward read had the common sequence trimmed beforehand (GACGTCATCGTCCTGCAGG), the pair of reads was kept and sent back to the last trimming step.
We wrote a Python script to identify which enhancer was present within each cell. We observed that some of our reads contained enhancer barcodes that were different from the list of 25 barcodes used in the experiment, suggesting potential PCR amplification or sequencing errors. To correct for this, only reads containing one of the 25 expected enhancer barcodes and a cell barcode identical to one of the cells analyzed in the scRNA-seq analysis were kept.
For nearly every cell, more than two different enhancer barcodes were found, suggesting again PCR and sequencing error leading to the miss identification of our barcodes. To reduce the noise generated by such errors, a threshold strategy was used to detect the “true” enhancer in each cell. First, the frequency of all enhancer barcodes found in each cell was computed by counting how many UMIs were found for each enhancer barcode. If the most frequent enhancer barcode was found in 10 times more UMIs than the second most frequent, it was considered the real enhancer for that cell. If a cell had only one enhancer barcode, it needed to be combined with at least five different UMIs to be considered real.
Virtual embryo reconstruction using novoSpaRc
Principle of optimal transport in novoSpaRc:
The expression matrix post Seurat analysis was loaded and processed using the annData python package (v 0.10.2) [45]. Additionally, the atlas, representing one half of the surface of a stage 6 Drosophila embryo, was sourced from the novoSpaRc GitHub repository [28]. This atlas comprises 3 039 positions and captures the expression pattern of 84 marker genes.
The dataset was prepared by normalizing and converting the expression values into logarithmic values. Two cost matrices were computed to set up the reconstruction with parameters num_neighbors_s and num_neighbors_t set to 3 and 5, respectively. The reconstruction itself was executed with default parameters, except for the alpha_linear parameter, which was set to 0.35.
NovoSpaRc generates two output matrices after performing the optimal transport. The first one is a probability matrix, depicting the probability of finding each scRNA-seq cell at each position in the virtual embryo. The sum of probabilities for an individual cell over every position is equal to one. This matrix is then multiplied by the scRNA-seq expression matrix, resulting in a product matrix presenting the expression of every gene at each position in the virtual embryo. We used the product matrix to project the expression of a query gene on the virtual embryo and the probability matrix to project the probability of finding a cell in the virtual embryo.
Evaluation of the reconstructions:
To evaluate the robustness of the reconstruction, a leave-one-out cross-validation was performed. For each gene in the atlas, we compared its original expression pattern with its reconstructed expression pattern after being omitted from the atlas. We manually encoded the Structural Similarity Index Measure (SSIM) equation (see below) to observe the similarity between the reconstructions and the atlas. For each gene, we randomized the reconstruction, keeping the same expression values but placing them randomly across the 3039 positions, 100 times with a different seed. We used Kolmogorov–Smirnov test (KS-test) to compare the similarity between the SSIM score of each gene expression pattern reconstructed after the leave-one-out strategy to the one of the random reconstructions.
Plotting the reconstructions in 3D:
To facilitate the generation of 3D interactive plots of the reconstruction, modifications were made to the core pl.embedding function to enable compatibility with Plotly (v 5.18.0) instead of Matplotlib. This involved adding a z-axis to the output matrix and implementing a Matplotlib normalization step within the loop that extracts values from the reconstruction. The reconstructions were visualized with Plotly by creating a 3D scatter of all locations and adding a second trace showing the expression level of a query gene on the reconstruction. The expression level corresponds to the normalized value of the predicted expression of the query gene over the virtual embryo, multiplied by the maximum intensity of this gene in the scRNA-seq dataset. Additionally, parameters such as a threshold parameter for displaying only positions with expression levels above a specified threshold and a screenshot parameter for automatically capturing dorsal, ventral, and lateral views were incorporated into the function.
Custom functions for enhancer activity reconstruction:
To visualize the spatial distribution of cells where a given enhancer is active, we implemented two custom functions. The first function extracts the indexes of selected cells, those expressing a particular enhancer for example, based on the probability matrix generated by novoSpaRc. This matrix encodes the likelihood of each scRNA-seq cell being located at each position in the virtual tissue. The second function sums the presence probabilities of these selected cells across all positions, producing a vector whose length equals the number of positions in the reconstruction. Each value in the vector represents the cumulative probability of enhancer-associated cells being present at that position. This vector is then integrated back into the novoSpaRc object and visualized using the modified plotting function, replacing gene expression with cell presence probability via the “probability” parameter. The values will be normalized between 0 and 1 when plotted, with 1 being the position(s) of highest probability.
User-friendly browsing of the results:
To facilitate community access and exploration of our results, we developed a Shiny application developed in python and deployable with Docker. It is accessible at: https://bioshiny.ens-lyon.fr/public/app/spatial-scERA. The application offers several interactive tools. The main interface allows users to select any gene or enhancer and view its expression/activity in real time across the virtual embryo. A second panel enables dual-gene visualization, highlighting regions of co-expression. A third panel allows users to compute the SSIM between the spatial expression patterns of two genes.
Comparing spatial-scERA reconstructions to HCR acquisitions
The RNA in situ hybridizations performed in this study by HCR, previously published in situ hybridization experiments, and spatial reconstructions were processed using Fiji (v1.54j) and the skimage python library (v0.24.0) [46, 47]. Images were first converted to 32-bit, background set to NaN, and intensity scaled from black to white in Fiji. To compare intensity levels, all embryos were standardized to the same length scale (0 for the anterior end and 100 for the posterior end).
Using Python, we normalized the pixel intensities of each image to a maximum of 170 for in situ data (maximum intensity observed across every enhancer in situ) and 225 for reconstructions (maximum intensity observed across every enhancer reconstruction). We then applied a binning strategy to calculate the mean intensity of pixels between pairs of x-coordinates, with a step size of one. Background pixels, set to NaN, were excluded from the mean calculation, ensuring that only pixels depicting the embryo were considered. We used Plotly to visualize the intensity profile along the embryo. When available, both the reconstruction and in situ intensity curves for an enhancer were plotted together for comparison.
Identification of putative target genes of enhancers using SSIM scores
For each enhancer, we extracted all genes within a 600 kb genomic window using the pybedtools python library (v 0.10.0) [48, 49]. A custom Python script was developed to calculate the SSIM score comparing the enhancer activity and the expression of each gene in this region. Additionally, the genomic distance between the enhancer midpoint and the transcription start site (TSS) of each gene was computed. The SSIM scores for the top 10 genes with the highest SSIM score are listed in Supplementary Table 3. SSIM scores for the enhancer-target/closest genes were transferred to R for visualization as bar plots using ggplot2 [50].
To identify potential new target genes, we leveraged a micro-C dataset from nuclear cycle 14 Drosophila embryos [51] at 5 and 1 kb resolutions. We created virtual 4C bedgraph files to visualize interaction scores between the enhancer regions and their surrounding genome using the hicPlotViewpoint function from the HiCExplorer package (v 3.7.2) [52]. For Supplementary Fig. S28C, to highlight interactions between an enhancer and a candidate gene across the boundaries of a Topologically Associating Domain (TAD), we also generated a micro-C heatmap from the same micro-C data. The virtual 4C and micro-C heatmap were plotted using pyGenomeTracks (v 3.8) [36].
To overlap the enhancer and target/closest gene spatial patterns, a custom novoSpaRc function was implemented. This function visualizes the gene expression positions on the reconstruction, overlays the enhancer activity, and highlights their co-expressed positions as a final layer. A threshold for expression values is adjustable through the “threshold” parameter for improved clarity in the visualization.
Results
Spatial-scERA reconstructs in vivo single-cell enhancer activity in multicellular organisms
To gain a comprehensive view of spatial enhancer activity at single-cell resolution in multicellular organisms, we developed a method that combines parallel enhancer-reporter assays, single-cell transcriptomics, and spatial reconstruction using optimal transport (Fig. 1A–C). This integrated approach, termed spatial single-cell Enhancer-Reporter Assay (spatial-scERA), covers all the steps from selection of putative enhancers to spatial reconstruction of their activity using computational approaches.
Figure 1.
Overview of Spatial-scERA to characterize spatial enhancer activity in Drosophila embryos. (A–C) Overall design of spatial-scERA. (A) A library of enhancer-reporter assay vectors is injected as a pool in Drosophila embryos. Genomic integration of the reporter construct is established by screening for red-eyed flies. Embryos are collected from transgenic flies, their cells dissociated and subjected to 3′end scRNA-seq. (B) The cDNA generated during scRNA-seq library preparation is used to generate two different sequencing libraries: classical scRNA-seq and targeted PCR amplification. Arrows indicate the portion of the transcript sequenced in each strategy. The location of the enhancer-specific barcode is indicated by a star. (C) Bioinformatic analysis of the sequencing data is used to identify the cell-types in which each enhancer is active, and to generate a spatial reconstruction of each enhancers’ activity in a virtual Drosophila embryo. (D) Genome browser plot for one candidate region. Top to bottom: DNase-seq signal at stage 5 [15]; ChIP-seq signal for H3K4me1 and H3K27ac histone modifications (0–4 h after egg lay) [32, 33], RNA-seq signal (2–4 h after egg lay) [34]. The location of nearby genes and characterized enhancers from the REDfly database [35] are indicated. The selected region is represented by the vertical band. .
Spatial-scERA is based on the construction of a reporter library, where each candidate enhancer drives the expression of a reporter gene. As in STARR-seq [18], each candidate enhancer is cloned downstream of the reporter and upstream of a polyadenylation signal, placing the candidate enhancer sequence within the 3′ UTR of the reporter transcript. However, unlike MPRAs, the reporter library is not transfected into cells in culture but directly injected into Drosophila melanogaster embryos and integrated in the genome at a single location. As a consequence, screening the expression of the reporter gene provides not only information on the ability of a candidate enhancer to drive expression, but also give key insights regarding its tissue-specific activity within a complex multicellular organism. For the method to be scalable, cell-type-specific expression of the reporter is not established by microscopy as in traditional enhancer-reporter assays, but by detecting the sequence of each candidate enhancer within the reporter transcript in a 3′ scRNA-seq experiment. A key innovation of our method lies in the addition of a specific barcode at the 3′ end of the enhancer sequence. This barcode is used to identify the cells in which a candidate enhancer is active by targeted PCR, significantly increasing the number of positive cells detected in a single experiment. Finally, the in vivo spatial activity of each enhancer is predicted by reconstructing their activity pattern on a 3D virtual embryo using a custom version of novoSpaRc, a spatialization method based on optimal transport [27]. This customization allows us to combine information from scRNA-seq and targeted PCR amplification, but also handle the sparse nature of the data by mapping the probability of presence of the enhancer at any position within the embryo.
Selecting candidate enhancers
We evaluated spatial-scERA in stage 6 Drosophila embryos using a set of candidate enhancers including both positive controls known to be active at stage 6 and uncharacterized regions. To enrich this pool for potential tissue-specific enhancers active at stage 6, we selected the candidate enhancers based on three parameters (see the ‘Materials and methods’ section): (i) their proximity to genes expressed in a tissue-specific manner in stage 6 Drosophila embryos [28], (ii) their overlap with DNase I hypersensitive regions, and (iii) the presence of histone modifications characteristic of enhancer regions (H3K27ac and H3K4me1) [32, 33] (Fig. 1D). Importantly, we excluded regions that appeared to be actively transcribed in a bulk RNA-seq dataset [34] to ensure that sequencing reads mapping to the enhancer sequence were specific to the reporter construct. These criteria yielded a total of 111 regions, from which we further selected 25 candidates (Supplementary Table 1) by cross-referencing the candidate regions with a database of characterized Drosophila enhancers [35] to include a majority of uncharacterized regions. Our final list included five well-characterized stage 6 enhancers such as the twi_ChIP-42 enhancer regulating the twist (twi) gene [53, 54] and the h_stripe1 enhancer regulating the hairy (h) gene [55], which were used as positive controls. One enhancer named ChIP-27 was selected as negative control with no activity established at stage 6 [53]. The remaining 19 regions were composed of six enhancers with established activity at other stages of development but with no information at stage 6, such as the prd01 enhancer active in the posterior half of each segment of the embryo at stage 10 [56]. There was no information available regarding the spatiotemporal activity of the remaining 13 regions, which were named CRM1–CRM13 (Supplementary Table 1).
Applying spatial-scERA to 25 candidate enhancers
For each of the 25 candidate enhancers, we narrowed down a minimal region of about 1 kb centered around the DNase I hypersensitivity signal (Supplementary Figs S1–S13). These regions were amplified by PCR, flanking their 3′ end by a 19 bp common sequence (for downstream targeted PCR amplification) and a unique 6 bp enhancer barcode. This cassette was then cloned in a reporter vector downstream of the CD2 reporter gene under the control of the minimal hsp70 promoter and upstream of the SV40 polyadenylation signal (Supplementary Fig. S14). The reporter vector also contains an attB site, enabling precise site-specific integration of the vector in the genome of attP-containing fly lines using the PhiC31 integrase system. Compared to random integration, targeted integration at the same genomic location prevents artifacts resulting from positional effects. Moreover, the integration process is irreversible and excludes the possibility of multiple integration events. Finally, the reporter vector contains the mini-white cassette, allowing for the convenient identification of transgenic flies by the presence of red eyes in the progeny of the injected flies.
We injected a reporter plasmid library containing the 25 candidate enhancers in about 3250 embryos and crossed the resulting flies in batch to a white-eyed fly line. In the next generation, we obtained 285 red-eyed transgenic animals, which were further crossed to each other to generate a pool of flies containing two copies of the reporter construct, one on each allele. All cloning, injection, and fly crossing steps were carried out in batch to allow for the rapid generation of spatial-scERA libraries from multiple candidate enhancers in parallel. We then collected embryos at 2.5–3.5 h after egg lay (corresponding to developmental stages 5–7, with a majority of stage 6) from our pool of transgenic flies, dissociated them to a single cell suspension, and profiled gene expression using 3′ end droplet-based scRNA-seq. To increase the number of analyzed cells, we collected and sequenced three different batches of embryos from the same pool of transgenic flies.
Following sequencing, we mapped the reads to a modified version of the Drosophila genome that included the sequence of our reporter constructs. This resulted in a total of 6588 high-quality cells from the three batches of sequencing, which were grouped into 16 distinct clusters by unsupervised clustering. We labeled the clusters based on the most frequent cell-type annotation in the BDGP database [31] of the top 10 differentially expressed genes for each cluster. This process resulted in the fusion of these clusters into 11 different cell types, which encompass both major cell-types such as the ectoderm and the mesoderm as well as rarer cell-types such as pole cells and the mesectoderm (Fig. 2A and B). For example, the yolk was characterized by the expression of the gene sisterless A (sisA) [57] while the Ptx1 and stumps genes were more highly expressed in the endoderm and mesoderm clusters, respectively [58, 59] (Fig. 2C and Supplementary Fig. S15).
Figure 2.
Spatial-scERA can detect tissue-specific enhancer activity. (A) UMAP of all cells (n = 6588 cells) present in the scRNA-seq dataset with the 11 identified cell-types. (B) Clustered Dot Plot presenting the most differentially expressed genes of each cluster. The size of the dot represents the percentage of cells expressing the gene in each cluster and the color the average differential expression of this gene in this cluster (log2 fold change). (C) Examples of highly variable genes selected from the Clustered Dot Plot. The color of the dot represents the expression of a given gene (log2 fold change). (D) UMAPs representing the activity of three enhancers. Cells with an active enhancer are highlighted in the UMAP (red).
Enhancing recovery through targeted PCR amplification
Having generated a high-quality scRNA-seq dataset, we proceeded to identify the cells expressing our candidate enhancers. In scRNA-seq experiments, detecting lowly expressed transcripts is particularly challenging due to the limited amount of RNA per cell but also because of technical issues such as dropout effects [60, 61]. As a result, capturing weakly active enhancers can be particularly difficult. To maximize the detection of reporter expression across more cells, we developed a complementary approach that involves targeted PCR amplification and sequencing of a unique enhancer barcode within the reporter construct. This barcode allowed us to confirm and identify new cell-enhancer pairs within the cells sequenced in the scRNA-seq dataset. The 6 bp-long enhancer barcode is located downstream of the enhancer sequence, just upstream of the polyadenylation signal, and is flanked on each side by two common sequences absent in the Drosophila genome. By using the first common sequence and the 10× Read1 sequence as PCR primers, we generated a 120-bp product comprising both the enhancer barcode and the 10× cell barcode (Fig. 1B). We performed PCR amplification and sequencing separately on the three batches of embryos. We then developed a custom analysis pipeline to extract both enhancer and cell barcodes from the sequencing reads, subsequently integrating the identified cell-enhancer pairs with the scRNA-seq dataset (see the ‘Materials and methods’ section).
Out of the 6588 cells in our dataset, we identified 1098 cells carrying an enhancer from the scRNA-seq libraries, and 910 cells through targeted PCR amplification. Notably, 349 cells were common to both methods, and 561 cells were only identified by targeted PCR, highlighting the value of PCR sequencing in corroborating and enriching our dataset. Importantly, these cells displayed no bias towards any specific tissue. Moreover, the cells detected by targeted PCR amplification were predominantly found in the same cell types where the enhancer had already been identified by scRNA-seq alone (Supplementary Figs S16–S18), further validating the complementary nature of these two methodologies.
In conclusion, simply adding a short barcode to each candidate enhancer increased by 33% (561 cell identified only by targeted PCR cells out of 1656 cells in total) the number of cells carrying an active enhancer detected by spatial-scERA, significantly improving our ability to reconstruct each enhancers’ spatial activity.
Spatial-scERA successfully captures enhancer activity in single cells
With a comprehensive understanding of the cell types in our dataset and an optimized protocol to efficiently capture cells containing an active enhancer, we proceeded to generate a global overview of the activity of the enhancers present in our library. Overall, 23 out of the 25 candidate enhancers were successfully retrieved in our combined dataset. Two enhancers were missing from the dataset (CRM5 and CRM8), suggesting that they are probably not active at stage 6. The number of cells per enhancer varied significantly, ranging from a single cell for psc_E14 to 320 cells for shg_A, with an average of 60 cells per enhancer. We analyzed the distribution of each enhancer across the different clusters by calculating the percentage of cells in which the enhancer is present within each cluster (Supplementary Table 2). Eleven enhancers were clearly active in the embryo, either across multiple clusters (for example GMR77A12, present in 294 cells and salm_blastoderm_early_enhancer, present in 58 cells), or in a tissue-specific manner [for example twi_ChIP-42, present in 128 cells, most of which are found in the mesoderm cluster (Fig. 2D and Supplementary Figs S1–S13)]. The remaining 12 enhancers were active in very few cells. While the GMR83E01 (15 cells) and CRM3 (19 cells) enhancers were enriched in the yolk cluster, no discernible patterns were immediately apparent for the other 10 enhancers (Fig. 2D and Supplementary Figs S1–S13and Supplementary Table 2).
To verify whether the enhancers missing or weakly detected in the dataset were indeed not active at stage 6, we performed RT-qPCR in a subset of individually-generated reporter lines (Supplementary Fig. S19). We confirmed a complete absence of activity for CRM5 and CRM8, corroborating their absence in our spatial-scERA dataset. Similarly, CRM6 and CRM12, which were recovered in very few cells, exhibited a very low activity. Conversely, enhancers identified in many cells by spatial-scERA were also highly active in RT-qPCR. Overall, the results observed by RT-qPCR were consistent with the spatial-scERA data.
By integrating scRNA-seq and targeted PCR amplification, we identified potential tissue-specific activity for 11 out of 25 enhancers in our dataset. However, since scRNA-seq randomly samples a small proportion of the initial cell suspension, it will not capture all the active cells for each enhancer. With sufficient data (i.e. number of cells), we can infer the primary cell types in which an enhancer is active. Nevertheless, for some enhancers, we may not have captured enough cells to determine cell type specificity. In such cases, UMAP representation alone is insufficient to predict cell type specificity. To address this limitation, we used novoSpaRc, a computational tool capable of predicting the probability of mapping each cell found in a scRNA-seq data at each position of a virtual representation of any tissue or organism via optimal transport, in our case a Drosophila embryo.
In silico reconstruction of stage 6 Drosophila embryos from spatial-scERA data
To reconstruct the spatial activity of the genes expressed in our scRNA-seq dataset in silico, we projected our sequenced cells onto a virtual stage 6 embryo composed of 3039 positions using the novoSpaRc computational framework [26, 27]. NovoSpaRc uses optimal transport to infer a mapping between the gene expression space from a scRNA-seq dataset to the tissue space of the sample of interest, which is then used to predict the spatial expression of any given gene. For this purpose, novoSpaRc first produces a probability matrix, predicting the location of each scRNA-seq cells in the reconstruction, and multiplies this matrix by the scRNA-seq expression matrix, to compute the expression of every gene onto the virtual tissue. NovoSpaRc spatial reconstruction accuracy is improved when used in conjunction with an atlas providing prior information on the spatial expression of informative marker genes. The reconstructions are thus a mathematical combination of (i) the probability of projecting each scRNA-seq cell on every position in the virtual tissue, (ii) the scRNA-seq expression matrix, and (iii) the prior information contained in the atlas. In this case, we took advantage of a pre-existing atlas consisting of 84 marker genes expressed at stage 6 [28], which recapitulates most expression patterns distinguishable during early Drosophila embryogenesis. We used this atlas to uncover the spatial position of the cells in our scRNA-seq dataset and plot their expression level. However, as the atlas does not account for internal cells, we first excluded yolk, hemocyte, and unknown clusters from our scRNA-seq dataset. The remaining 5475 cells were re-analyzed from the raw matrix and resulted in nine cell-types, including a novel cell-type labeled as head mesoderm (Fig. 3A and B).
Figure 3.
Validation of the spatial reconstruction. (A) UMAP of the reduced version (n = 5475 cells) of the scRNA-seq dataset with the nine identified cell-types. (B) Schematic of a stage 6 Drosophila embryo displaying the expected location of the cell-types present in the scRNA-seq dataset. (C) Comparison between published in situ hybridization from the BDGP in situ database (top) [31] and spatial reconstruction with novoSpaRc (bottom) for four different genes. In the reconstruction, the lighter the blue, the higher is the level of expression (normalized expression intensity in the scRNA-seq data). (D) Boxplot showing the SSIM score for the reconstruction of all atlas genes using leave-one-out cross-validation, compared to the SSIM score of a random reconstruction model. Median is depicted as a bar (orange), while the box extremities are Q1 and Q3, respectively. The P-value from a KS-test test between the two distributions is displayed at the top of the plot. (E) Projection of the cells forming a cluster onto the reconstruction. UMAPs highlight the cells forming the selected cluster (blue). In the reconstruction, the color code indicates the probability of presence of the cells in the reconstruction.
We first validated the accuracy of the reconstructions by comparing them to the expression of well-known tissue-specific marker genes (Fig. 3C). As expected, cells expressing the mesodermal Myocyte enhancer factor 2 (Mef2) gene were positioned ventrally. Similarly, we were able to accurately reconstruct the expression of the LIM homeobox 1 (Lim1) and ventral nervous system defective (vnd) genes. Even genes with highly restricted expression, such as Sex combs reduced (Scr) which is expressed in a single stripe, were well reconstructed. However, we observed that the reconstructed expression pattern of some genes was sometimes distorted, displaying stripe-like patterns. This is visible, for example, in the reconstructions of the Mef2 and vnd genes (Fig. 3C), and is also seen in the reconstruction based on the Drosophila dataset associated with novoSpaRc [28] (Supplementary Fig. S20). This phenomenon might be due to a bias in the atlas with the over-representation of genes active in stripes.
To evaluate the statistical robustness of our approach, we assessed the accuracy of the reconstruction by comparing it to the expected pattern in the atlas using a leave-one-out cross-validation strategy. We used the SSIM as our evaluation metric [62, 63], a tool originally developed for image analysis that reliably captures similarity based on overall intensity, the degree of intensity variation, and the preservation of spatial expression patterns. Our analysis yielded a mean SSIM score of 0.45 across the 84 genes in the atlas, with a distribution that differed significantly from a random model (P-value = 2e-107, KS-test; see the ‘Materials and methods’ section; Fig. 3D and Supplementary Fig. S21). These results indicate that the reconstruction generated from this atlas is statistically robust, enabling us to assess gene expression and enhancer activity with confidence.
Having established that novoSpaRc can be used to reconstruct the spatial expression of the genes in our dataset, we next applied this tool to reconstruct the activity of each candidate enhancer. To assess enhancer activity, spatial-scERA combines two modalities: the expression of the CD2 reporter gene obtained from scRNA-seq and the presence/absence of a barcode sequence identified by targeted PCR amplification. Working with expression levels as in novoSpaRc is thus not possible anymore. Instead, we decided to plot the probability of presence of the cells containing an active enhancer within the virtual embryo. To achieve this, we upgraded the novoSpaRc pipeline by allowing the software to plot the predicted location of a group of cells (i.e. all the cells carrying a given active enhancer) in the reconstruction. The location of these cells in the virtual embryo space is predicted by optimal transport. Because there is not a single optimal position, the application of optimal transport actually computes a probability of finding each cell at all the positions of the virtual embryo. As a result, the spatial reconstruction will display the probability of presence of the enhancer at any position of the embryo, even if enhancer activity was not directly measured by scRNA-seq in this cell. We tested this feature by mapping the spatial location of the same four genes used to validate our spatial reconstruction, confirming that the reconstruction based on the probability of presence correctly recapitulates the genes’ expected expression (Supplementary Fig. S22). We then applied this approach to map the spatial location of each of the nine scRNA-seq clusters onto the virtual embryo. Eight out of nine clusters were mapped to their expected spatial location (Fig. 3E and Supplementary Fig. S23), further validating this approach and the overall accuracy of the reconstruction.
Spatial information must be considered for proper scRNA-seq cluster annotation
The ninth cluster, initially labeled as dorsal ectoderm, deviated from expectations. At stage 6, the dorsal ectoderm consists of a row of cells on the dorsal side of the embryo, between the amnioserosa and the lateral ectoderm (Fig. 4A). However, our spatial reconstruction mapped these cells along two distinct stripes at the opposite ends of the trunk region. To understand this discrepancy, we examined the actual expression pattern of the top differentially expressed genes in the dorsal ectoderm cluster. While “dorsal ectoderm” was the most common annotation for these genes in the BDGP database, many were in fact segmentation genes expressed in antero-posterior stripes, such as spalt major (salm), ken and barbie (ken), and Scr (Fig. 4B and Supplementary Fig. S24). In contrast, bona fide dorsal ectoderm genes such as zerknüllt (zen) and Dorsocross2 (Doc2) were not enriched in this cluster, confirming the inaccurate annotation (Fig. 4C).
Figure 4.
Spatialization identifies a mislabeled cluster. (A) Representation of the encountered mislabeling. Left: The location of the dorsal ectoderm in a stage 6 Drosophila embryo is highlighted (cyan). Middle: The location of the cluster annotated as dorsal ectoderm in the UMAP is highlighted (blue). Right: Location of these cells after projecting them on the virtual embryo. (B, C) UMAPs (top), reconstruction (middle), and published in situ hybridization from the BDGP in situ database (bottom) [31] of two genes differentially expressed in the mislabeled cluster (B) and two genes actually expressed in the dorsal ectoderm (C).
Having established that this cluster was not dorsal ectoderm, we then verified that our spatialization was able to correctly reconstruct the expression of the genes wrongly assigned to the dorsal ectoderm and of actual dorsal ectoderm genes. In both cases, the reconstruction accurately reflected the expression patterns observed by in situ hybridization, confirming that the mislabeling was due to the annotation of the cluster in the scRNA-seq data, rather than an issue with the reconstruction (Fig. 4B and C).
In conclusion, this result highlighted a key limitation of scRNA-seq: clusters do not necessarily correspond to distinct cell types but rather to sub-populations of cells with a similar transcriptome. Therefore, clusters cannot be accurately annotated based solely on a priori knowledge of a few highly differentially expressed genes. To ensure correct cluster labeling, transcriptomic information must be combined with spatial information.
Spatial-scERA accurately confirms the spatial enhancer activity of known enhancers
Having established that our spatialization method faithfully reconstructs the spatial position of a group of cells in our dataset, we proceeded to predict the activity of the 25 candidate enhancers under study. For this purpose, we plotted the probability of presence of every cell where a given enhancer was identified by scRNA-seq or targeted PCR at each position of the virtual embryo. To validate our predictions, we generated stable fly lines for all 25 enhancers, and analyzed the expression of the CD2 reporter gene by in situ hybridization. We first focused on the five positive controls, comparing reconstruction results to previously published in situ hybridization experiments [53, 55, 64–67] or to our own images. We evaluated the reconstructions through visual inspection and numerical comparison of activity levels along the antero-posterior axis of the embryo, comparing the predicted activity patterns to the expected ones (Fig. 5A–D, and Supplementary Figs S1, S2, and S3A).
Figure 5.
Spatial-scERA accurately predicts enhancer activity. (A–D) Left: Comparison of the activity observed by in situ hybridization and our spatial reconstructions for four control enhancers. Middle: Profiles of average expression levels across the entire embryo by in situ hybridization experiments (red ) and in the reconstructions (blue). Right: Number of cells with active enhancer and their location in the UMAP (red). (E) Examples of predicted spatial activity for four enhancers (bottom) in comparison to in situ hybridization (top). Faint expressions are pinpointed with arrows.
The results showed a strong correlation between the predicted reconstruction and the known pattern for all five control enhancers. For example, twi_ChIP-42 was correctly predicted to be active throughout the mesoderm and head mesoderm [53] (Fig. 5A). vnd_743 was mapped to the mesectoderm and procephalic region [65] (Fig. 5B). The spatial reconstruction of salm_blastoderm_early_enhancer predicted its activity in three main stripes in the procephalic, cephalic furrow and posterior trunk regions [64] (Fig. 5C). The spatial reconstruction also predicted activity in additional stripes in the trunk, which are faintly visible in our in situ hybridization experiments. The activity of the eve_late_variant enhancer was predicted to follow a stripe pattern. However, some of the middle stripes were missing, probably due to the limited number of cells carrying this enhancer in our dataset (Supplementary Fig. S1A). The most striking result was observed with h_stripe1. Spatial-scERA identified this enhancer in just nine cells scattered across the UMAP, making tissue-specific activity determination challenging from this data alone. However, the reconstruction revealed a precise stripe that perfectly matched the expected location [55] (Fig. 5D). Finally, as expected, the ChIP-27 enhancer which was used as a negative control did not display any activity in the reconstruction (Supplementary Fig. S3B). Overall, this confirmed that spatial-scERA can properly predict enhancer activity, even with as few as nine cells in the dataset.
Spatial-scERA can be used to predict the spatial activity of uncharacterized regions
Having validated the accuracy of our enhancer predictions, we proceeded to generate reconstructions for the 19 uncharacterized putative enhancers in our dataset. As mentioned previously, CRM5 and CRM8 were not active in the scRNA-seq dataset, indicating that these enhancers are likely not active at stage 6, a conclusion confirmed by imaging the corresponding fly lines (Supplementary Figs S9A and S10B). The psc_E14 enhancer was detected in only a single cell in the full scRNA-seq dataset, which was discarded in the reduced dataset used for reconstruction (Supplementary Fig. S4B). We therefore concluded that this enhancer was not active at stage 6. Among the other enhancers, shg_A, GMR77A12, GMR83E01, and CRM3 were identified in several cells of the yolk cluster. As the yolk was removed from the dataset for spatial reconstruction, we could not confirm this activity in the virtual embryo. However, activity in the yolk was confirmed by in situ hybridization, providing additional evidence that our spatial-scERA effectively captures cell-type specific enhancer activity (Supplementary Fig. S25).
Except for psc_E14, the other five enhancers known to be active at later stages were clearly also predicted to be active at stage 6. prd01, shg_A, and GMR77A12 all displayed a relatively broad pattern: along seven stripes within the lateral ectoderm for prd01 (Fig. 5E and Supplementary Fig. S4A), across nearly the entire embryo excluding the amnioserosa and trunk mesoderm for shg_A (Fig. 5E and Supplementary Fig. S5A), and throughout the trunk region for GMR77A12 (Supplementary Fig. S6A). GMR83E01 on the other hand displayed a more localized pattern in the procephalic region (Supplementary Fig. S6B). All these patterns were confirmed by imaging of the reporter lines (Fig. 5E and Supplementary Figs S4–S6). However, the SoxN_5830 enhancer, predicted to be active only in the procephalic region, showed ubiquitous activity by imaging (Supplementary Fig. S5B). This discrepancy likely stems from an under-sampling of cells carrying the SoxN_5830 enhancer in the spatial-scERA dataset, potentially due to an under-representation of flies carrying this enhancer in our pool of transgenic flies. Indeed, we could only recover 18 cells carrying the active enhancer, which is probably insufficient to predict a broad pattern of activity.
From the thirteen entirely uncharacterized candidate enhancers, some, such as CRM3, CRM4, CRM7 CRM9, and CRM11 are clearly active and predicted with a tissue-specific pattern, which is confirmed by imaging (Fig. 5E, and Supplementary Figs S8A and B, S10A, S11A, and S12A). Other candidates were either not active (CRM5 and CRM8; Supplementary Figs S9A and S10B) or predicted to be active in highly restricted patterns that were not confirmed by imaging (CRM6 and CRM12; Supplementary Figs S9B and S12B). The latter often correspond to enhancers that were active in very few cells in the combined scRNA-seq and targeted PCR dataset, and also with nearly no expression observed by RT-qPCR (Supplementary Fig. S19). These inaccurately positioned cells are systematically found on dorsal side of the embryos. This is particularly striking for CRM6 (Supplementary Fig. S9B) and CRM12 (Supplementary Fig. S12B), but is also visible as slight inconsistencies for several other enhancers (twi_ChIP-42, ChIP-27, CRM2, and CRM9). Finally, some candidate enhancers presented discrepancies between the spatial-scERA predictions and our imaging data. Most often, this is due to an under-sampling of cells carrying the enhancer, as observed for the SoxN_5830 enhancer. This is the case for CRM10 (two cells; Supplementary Fig. S11B) and CRM13 (eight cells; Supplementary Fig. S13). In other cases, the reconstructions predict a broader activity than the one observed by imaging. This is the case for CRM1 (Supplementary Fig. S7A) and CRM2 (Supplementary Fig. S7B). We suspect that this broader activity might result from the presence of a small proportion of slightly older embryos in the embryos we collected for scRNA-seq. Indeed, the reconstructions of these two enhancers are consistent with the expression pattern of their closest genes (ImpL2 and vein) at stage 7.
In conclusion, the comparison of reconstructed activity from spatial-scERA with imaging data confirmed that in most cases our method faithfully recapitulates the activity of candidate enhancers. These predictions span spatial activity patterns ranging from no or very weak activity to very broad patterns. This comparison highlights the value of spatial-scERA in identifying the spatial activity of enhancers within a multicellular organism. However, we noticed three main limitations in our reconstructions: (i) caution should be applied when too few cells are captured by spatial-scERA, especially when studying an enhancer with a broad spatial pattern of activity, (ii) novoSpaRc tends to place cells for which it cannot reliably predict the location in spots on the dorsal side of the embryos, and (iii) the reconstruction can be biased by the presence of cells coming from slightly older embryos in our datasets.
Spatial-scERA is a better predictor of enhancer activity than DNase I hypersensitivity or histone modifications
When comparing the enhancer activity reconstructions obtained with spatial-scERA to DNase I hypersensitivity (stage 5) and histone modification signal (0–4 h after egg laying), we did not observe any obvious correlations. For example, regions such as psc_E14 which overlapped high DNAse I hypersensitivity and histone modification signal are in fact completely inactive in our spatial-scERA data (Supplementary Fig. S4B). Similarly, the ChIP-27 and CRM5 regions are inactive despite the presence of a DNAse I hypersensitivity peak (Supplementary Fig. S3B). Conversely, CRM13 is active despite the absence of a strong DNAse I hypersensitivity signal (Supplementary Figs S11B and S13). Still, DNAse I hypersensitivity seems to be a slightly better predictor of enhancer activity than histone modifications. Indeed, CRM8 and CRM10 are located in closed chromatin regions and are not active despite strong peaks of H3K4m1 and H3K27ac histone modifications (Supplementary Figs S10 and S13). Conversely, salm_blastoderm_early_enhancer, vnd_743, and prd01 are broadly active enhancers despite the absence of a strong H3K27ac signal (Supplementary Figs S2A, S3A, and S4A). This demonstrates the value of spatial-scERA as a complementary tool to genome-wide enhancer prediction techniques (such as MPRAs or the mapping of DNase I hypersensitive sites and histone modifications) for the precise mapping of spatial enhancer activity in a multicellular sample.
Spatial-scERA can be used to identify the enhancers’ putative target genes
Finally, we asked whether the spatial reconstructions generated by Spatial-scERA could be used to predict enhancer target genes and provide biological insights into enhancer-gene relationships. To this end, we used the SSIM score introduced earlier, to systematically compare the spatial reconstruction of each enhancer with that of all genes within a 600 kb window surrounding the enhancer. As a proof of principle, we first focused on the 11 enhancers in our dataset with known target genes. In nine out of eleven cases, the known target gene ranked among the top 10 genes with the highest SSIM scores (Supplementary Fig. 26A and Supplementary Table 3). Notably, for three enhancers (vnd743, salm_blastoderm_early_enhancer, and GMR77A12), the known target gene achieved the highest SSIM score. Conversely, two enhancers (eve_late_variant and SoxN5380) had particularly low SSIM scores, reflecting the poor quality of their spatial reconstructions due to insufficient cell sampling, as previously discussed. It is important to note that the SSIM score does not account for cases where an enhancer recapitulates only a subset of its target gene’s expression pattern. Nevertheless, in the case of the h_stripe enhancer, the hairy gene ranked fifth in SSIM score despite the fact that the enhancer is only active in a subset of the hairy expression pattern. To address this, we visually confirmed the overlap between enhancer activity and target gene expression by plotting both the enhancer reconstruction and the gene’s expression on the same virtual embryo (Supplementary Fig. 27).
Next, we applied this approach to ask whether the uncharacterized enhancers in our dataset regulate the expression of their nearest genes. While enhancers are often assumed to control the expression of their closest gene, in fact only 47% of enhancers interact with the nearest TSS in the human genome [68]. Even in the Drosophila genome, 73% of interactions span distances larger than 50 kb [69]. To test this in our dataset, we used the SSIM score to compare the spatial activity of the CRMs to the expression of their nearest genes. Excluding CRM5 and CRM8 (which showed no detectable activity), we found that for CRM1, CRM4, CRM9, and CRM11, their nearest genes (ImpL2, sca, sty, and Trim9, respectively) ranked among the top 10 genes with the highest SSIM scores (Supplementary Fig. 26B and Supplementary Table 3). Visual confirmation of the overlap between CRM11 activity and Trim9 expression further supported this result (Supplementary Fig. S27).
Interestingly, in some cases, the known or nearest gene was not necessarily the one with the highest SSIM score (Supplementary Table 3). For example, GMR83E01 is known to regulate otp at larval stages [70]. However, the SSIM score of otp ranked eighth (SSIM = 0.10), whereas Cht9 (SSIM = 0.48) and CG15650 (SSIM = 0.27) ranked first and second, respectively (Supplementary Fig. 28A). Although the expression of these genes has not been characterized via in situ hybridization, their reconstructed expression patterns strongly resemble the enhancer’s activity (Supplementary Fig. S28A). Moreover, Cht9 and CG15650 are located ∼161 and 70 kb from GMR83E01, across one or multiple TAD boundaries. Similarly, CRM2 only partially recapitulates the expression of its closest gene, vn, but closely matches the expression of the CG13288 (SSIM = 0.4424) and CG32407 (SSIM = 0.6191) genes, located ∼51 and 66 kb away across a TAD boundary (Supplementary Fig. 28B). An even more compelling example comes from the prd01 enhancer, which shares an overlapping expression pattern both with its target gene (prd1), with its closest genes, firl at 13 kb, and CG14947 at 4 kb, and with CG15480 (SSIM = 0.1108), located 1.1 Mb away across multiple TAD boundaries (Supplementary Fig. 28C). Using previously published Micro-C data from early Drosophila embryos (nuclear cycle 14) [51], we confirmed that the prd01 enhancer forms a long-range interaction with the CG15480 locus, strongly suggesting a functional regulatory interaction.
Overall, these observations suggest that the GMR83E01, CRM2, and prd01 enhancers might be involved in a chromatin hub, regulating the expression of multiple genes, some of which are located across large distance and TAD boundaries. This is in line with previous findings demonstrating the presence of functional inter-TAD enhancer-promoter interactions in the Drosophila genome [54]. Combining spatial-scERA with the SSIM score could thus offers a powerful framework to identify enhancer target genes and be highly complementary to chromatin conformation studies.
Discussion
We developed spatial-scERA to infer the spatiotemporal activity of candidate enhancer regions by projecting cells sequenced in a single-cell enhancer-reporter assay on a virtual reconstruction of the tissue of origin. Our method presents two key innovations. First, it combines scRNA-seq with targeted PCR to improve the identification of cells in which an enhancer is active. Second, it uses a custom version of novoSpaRc, a computational method for spatialization to reconstruct a map of enhancer activity on a virtual embryo. We applied spatial-scERA to 25 candidate enhancers in stage 6 Drosophila embryos and demonstrated that single-cell enhancer-reporter assays alone are not always sufficient to predict enhancer activity in vivo. In fact, we could establish the crucial requirement of spatialization for capturing complex spatially-defined expression patterns such as stripes. Overall, spatial-scERA recapitulated enhancer activity observed by imaging the same constructs in most cases. Interestingly, several enhancers were not active at stage 6 despite the presence of DNAse I and active chromatin modifications peaks indicating the opposite. Our work thus demonstrates that chromatin accessibility and enhancer-specific histone modifications alone are often poor predictors of enhancer activity [23]. This highlights the importance of methods such as spatial-scERA, to capture precise spatial enhancer activity in vivo in multicellular organisms. Finally, we combined spatial-scERA with the SSIM score to predict enhancer target genes and identified potential regulatory long-range interactions.
A key feature of spatial-scERA is its ability to generate virtual enhancer activity maps. While many methods leverage the cellular resolution of scRNA-seq to enhance existing spatial transcriptomic data through deconvolution [71–75], there are far fewer tools available for reconstructing spatial data directly from scRNA-seq alone. Among these, different approaches are employed: Machine Learning (MLSpatial [76]), optimal transport (novoSpaRc [26]), or a combination of both (D-CE [77]). Although MLSpatial and D-CE are more recent developments, they have not demonstrated superior performance compared to novoSpaRc, especially when it is used in conjunction with an atlas. Additionally, novoSpaRc has been successfully applied to reconstruct a variety of distinct tissues, including the floral meristem (23 genes in the atlas [78]), human duodenum epithelium (22 genes in the atlas [79]), and human organoids (32 genes in the atlas [80]). Given that novoSpaRc remains the state-of-the-art tool for spatial reconstruction from scRNA-seq data, and considering that it had already been optimized for the Drosophila atlas, it was a logical choice for our study. While the reconstructions were overall of high accuracy, we noticed a clear tendency to amplify stripe patterns in the reconstruction, probably due to the presence in the atlas of a majority of genes expressed in stripes (30 out of 84 genes). Moreover, when cells have a transcriptome that is too different from the one expected from the atlas, novoSpaRc tends to place them at locations where the atlas is less spatially informative, for example of the dorsal side of the embryo. This highlights the need for more complex and extensive atlases and for alternative spatial reconstruction methods. With the development of spatial transcriptomics technologies such as multiplexed RNA imaging or spatially-resolved DNA sequencing [81], we believe that large atlases will become commonly available in multiple species. This will be particularly important in complex multicellular organisms with highly heterogeneous tissues. In parallel, the emergence of deep learning-based alternatives to novoSpaRc offers new opportunities to improve the resolution and robustness of spatial reconstructions. Models such as LUNA [82], which captures both global and local cellular interactions using attention mechanisms, and COME [83], which applies contrastive learning to incorporate cell-type information in the reconstruction, have demonstrated improved accuracy in cell-to-location mapping. Combined with the higher resolution of new spatial transcriptomics techniques, we believe that our method could be extended to study enhancers in more complex tissues and any organism or organoid.
The main limitation of single-cell enhancer-reporter assays remains the ability to capture and sequence the cells in which our enhancers are active. This can be particularly challenging for enhancers that are active in a very small portion of the sample. To circumvent this limitation, we have combined scRNAseq with targeted PCR amplification, hence greatly improving cell recovery. Moreover, thanks to our custom version of the spatialization pipeline, we can reconstruct the activity of an enhancer by plotting its probability of presence at each position, even when as little as nine positive cells have been sequenced. Nevertheless, accurate reconstruction still requires a minimum number of sequenced positive cells per enhancer. This number will vary depending on the number of cells in which the enhancer is active, but we could estimate, based on our comparisons with imaging data, that in most cases at least 10–20 cells are required to reconstruct a restricted tissue-specific pattern and at least 50 cells for larger patterns. Given this efficiency, we estimate that spatial-scERA can easily be scaled up to query at least 100 enhancers in a single experiment targeting 10 000 cells. In addition, spatial-scERA could be combined with FACS-sorting of the cells expressing the CD2 reporter to increase the percentage of cell carrying an active enhancer in our dataset. While we present here a proof of concept with 25 candidate enhancer regions, we believe that with the decrease in cost and increase in efficiency of scRNA-seq experiments, it will become possible to routinely test a larger set of candidate regions of interest identified through genome-wide approaches and establish their spatial activity in an unbiased manner using spatial-scERA. Overall, future improvements in spatial reconstruction algorithms, the availability of spatial transcriptomics atlases in a variety of species, and the decrease in the cost of scRNA-seq experiments should makes methods such as spatial-scERA more widely applicable and ultimately pave the way for a more comprehensive characterization of in vivo enhancer biology in multicellular organisms.
Supplementary Material
Acknowledgements
We are grateful to Laurent Gilquin and Sandrine Hughes for helpful advice in the design and analysis of targeted PCR sequencing data. We thank Stephane Janczarski for all his help in the deployment and online storage of the shiny application. We are very grateful to Olivier Gandrillon and Laura Cantini for critically reading the manuscript. We thank all members of the Ghavi-Helm lab for discussions and comments on the manuscript. We also thank all the interns that made small contributions to the project throughout the years, in particular Nicolas Vaganay, Nathan Lecouvreur, and Louise Maillard. This work was technically supported by the IGFL sequencing facility (PSI), the IGFL microscopy facility, and the Arthro-tools facility of the Lyon SFR Biosciences (UAR3444/US8).
Author contributions: Y.G.-H. conceived and supervised the study (conceptualization, supervision). P.V. co-supervised the computational aspects of the study (supervsion). S.V., I.S., and Y.G.-H. designed experiments, B.A., I.S., P.V., and Y.G.-H. designed bioinformatics analysis (methodology). S.V. performed all experiments except microinjections which were performed by H.T., D.L., and J.M. B.A. assisted in the scRNA-seq experiment and performed all bioinformatics analysis (investigation, software, formal analysis). B.A. and S.S. designed the shiny application (data curation). S.S. contributed to the analysis of the single-cell dataset (software). All of the authors discussed the results and implications and commented on the manuscript at all stages (validation). Y.G.-H., B.A., S.V., and P.V. wrote the paper (writing). Y.G.-H. acquired funding (funding acquisition).
Contributor Information
Baptiste Alberti, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Séverine Vincent, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Isabelle Stévant, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Damien Lajoignie, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Hélène Tarayre, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Juliette Mendes, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Sergio Sarnataro, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Paul Villoutreix, Aix-Marseille Université, MMG, Inserm U1251, Turing Centre for Living systems, 27 Bd Jean Moulin, 13005 Marseille, France.
Yad Ghavi-Helm, Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, 46 allée d’Italie, 69007 Lyon, France.
Supplementary data
Supplementary data is available at NAR online.
Conflict of interest
None declared.
Funding
This work was supported by the Fondation pour la Recherche Médicale [AJE20161236686 to Y.G.-H and SPF201909009228 to I.S.); the European Research Council [ERC starting grant Enhancer3D 759708 to Y.G.-H]; the EquipEx+ Spatial-Cell-ID under the “Investissements d’avenir” program (ANR-21-ESRE-00016 to Y.G.-H); and a doctoral fellowship of the IADoc@UdL program to B.A.
Data availability
All raw data were submitted to ArrayExpress (https://www.ebi.ac.uk/arrayexpress/browse.html) under accession numbers: E-MTAB-14447 (scRNA-seq) and E-MTAB-14445 (targeted PCR amplification). The custom dm6 Drosophila genome used for mapping with CellRanger can be downloaded on Zenodo (https://zenodo.org/records/14006160; DOI:10.5281/zenodo.14006160). The following publicly available databases and datasets were used: FlyBase r6.40 (https://flybase.org/) using the dm6 reference genome, scRNA-seq (GSE95025); DNase I hypersensitivity (SRA:SRX020691, SRA:SRX020692); histone modifications (GSE6273); RNA-seq (SRR1197368, SRR767626, SRR1197336); Micro-C (GEO:GSE171396); Redfly website (http://redfly.ccr.buffalo.edu/).
Code and scripts used for analyses have been deposited on GitLab at https://gitbio.ens-lyon.fr/igfl/ghavi-helm/spatial_scera (DOI:10.5281/zenodo.15577028). The code for the shiny app can be downloaded at: https://gitbio.ens-lyon.fr/igfl/ghavi-helm/spatial-scera-shiny-app-python; and the online version of the application can be accessed with the following link: https://bioshiny.ens-lyon.fr/public/app/spatial-scERA.
References
- 1. Levine M Transcriptional enhancers in animal development and evolution. Curr Biol. 2010; 20:R754–63. 10.1016/j.cub.2010.06.070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Shlyueva D, Stampfel G, Stark A Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet. 2014; 15:272–86. 10.1038/nrg3682. [DOI] [PubMed] [Google Scholar]
- 3. Kvon EZ Using transgenic reporter assays to functionally characterize enhancers in animals. Genomics. 2015; 106:185–92. 10.1016/j.ygeno.2015.06.007. [DOI] [PubMed] [Google Scholar]
- 4. Visel A, Rubin EM, Pennacchio LA Genomic views of distant-acting enhancers. Nature. 2009; 461:199–205. 10.1038/nature08451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Kvon EZ, Kazmar T, Stampfel G et al. Genome-scale functional characterization of drosophila developmental enhancers in vivo. Nature. 2014; 512:91–5. 10.1038/nature13395. [DOI] [PubMed] [Google Scholar]
- 6. Chan Y-C, Kienle E, Oti M et al. An unbiased AAV-STARR-seq screen revealing the enhancer activity map of genomic regions in the mouse brain in vivo. Sci Rep. 2023; 13:6745. 10.1038/s41598-023-33448-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Kosicki M, Cintrón DL, Keukeleire P et al. Massively parallel reporter assays and mouse transgenic assays provide correlated and complementary information about neuronal enhancer activity. Nat Commun. 2025; 16:4786. 10.1038/s41467-025-60064-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Yuan X, Song M, Devine P et al. Heart enhancers with deeply conserved regulatory activity are established early in zebrafish development. Nat Commun. 2018; 9:4977. 10.1038/s41467-018-07451-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Ishibashi M, Mechaly AS, Becker TS et al. Using zebrafish transgenesis to test human genomic sequences for specific enhancer activity. Methods. 2013; 62:216–25. 10.1016/j.ymeth.2013.03.018. [DOI] [PubMed] [Google Scholar]
- 10. Pfeiffer BD, Jenett A, Hammonds AS et al. Tools for neuroanatomy and neurogenetics in Drosophila. Proc Natl Acad Sci USA. 2008; 105:9715–20. 10.1073/pnas.0803697105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Groth AC, Fish M, Nusse R et al. Construction of transgenic drosophila by using the site-specific integrase from phage φC31. Genetics. 2004; 166:1775–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Dunham I, Kundaje A, Aldred SF et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489:57–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Nègre N, Brown CD, Ma L et al. A cis-regulatory map of the Drosophila genome. Nature. 2011; 471:527–31. 10.1038/nature09990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Zinzen RP, Girardot C, Gagneur J et al. Combinatorial binding predicts spatio-temporal cis-regulatory activity. Nature. 2009; 462:65–70. 10.1038/nature08531. [DOI] [PubMed] [Google Scholar]
- 15. Thomas S, Li X-Y, Sabo PJ et al. Dynamic reprogramming of chromatin accessibility during Drosophilaembryo development. Genome Biol. 2011; 12:R43. 10.1186/gb-2011-12-5-r43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Muerdter F, Boryń ŁM, Arnold CD STARR-seq—Principles and applications. Genomics. 2015; 106:145–50. 10.1016/j.ygeno.2015.06.001. [DOI] [PubMed] [Google Scholar]
- 17. Wang X, He L, Goggin SM et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat Commun. 2018; 9:5380. 10.1038/s41467-018-07746-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Arnold CD, Gerlach D, Stelzer C et al. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013; 339:1074–7. 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
- 19. Gisselbrecht SS, Barrera LA, Porsch M et al. Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos. Nat Methods. 2013; 10:774–80. 10.1038/nmeth.2558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Patwardhan RP, Hiatt JB, Witten DM et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat Biotechnol. 2012; 30:265–70. 10.1038/nbt.2136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Fuqua T, Jordan J, van Breugel ME et al. Dense and pleiotropic regulatory information in a developmental enhancer. Nature. 2020; 587:235–9. 10.1038/s41586-020-2816-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Calderon D, Blecher-Gonen R, Huang X et al. The continuum of Drosophila embryonic development at single-cell resolution. Science. 2022; 377:eabn5800. 10.1126/science.abn5800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Bozek M, Gompel N Developmental transcriptional enhancers: a subtle interplay between accessibility and activity. Bioessays. 2020; 42:1900188. 10.1002/bies.201900188. [DOI] [PubMed] [Google Scholar]
- 24. Lalanne J-B, Regalado SG, Domcke S et al. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods. 2024; 21:983–93. 10.1038/s41592-024-02260-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Zhao S, Hong CKY, Myers CA et al. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat Genet. 2023; 55:346–54. 10.1038/s41588-022-01278-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Moriel N, Senel E, Friedman N et al. NovoSpaRc: flexible spatial reconstruction of single-cell gene expression with optimal transport. Nat Protoc. 2021; 16:4177–200. 10.1038/s41596-021-00573-7. [DOI] [PubMed] [Google Scholar]
- 27. Nitzan M, Karaiskos N, Friedman N et al. Gene expression cartography. Nature. 2019; 576:132–7. 10.1038/s41586-019-1773-3. [DOI] [PubMed] [Google Scholar]
- 28. Karaiskos N, Wahle P, Alles J et al. The Drosophila embryo at single-cell transcriptome resolution. Science. 2017; 358:194–9. 10.1126/science.aan3235. [DOI] [PubMed] [Google Scholar]
- 29. Hao Y, Hao S, Andersen-Nissen E et al. Integrated analysis of multimodal single-cell data. Cell. 2021; 184:3573–87. 10.1016/j.cell.2021.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Öztürk-Çolak A, Marygold SJ, Antonazzo G et al. FlyBase: updates to the Drosophila genes and genomes database. Genetics. 2024; 227:iyad211. 10.1093/genetics/iyad211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Hammonds AS, Bristow CA, Fisher WW et al. Spatial expression of transcription factors in drosophilaembryonic organ development. Genome Biol. 2013; 14:R140. 10.1186/gb-2013-14-12-r140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Heintzman ND, Stuart RK, Hon G et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat Genet. 2007; 39:311–8. 10.1038/ng1966. [DOI] [PubMed] [Google Scholar]
- 33. Heintzman ND, Hon GC, Hawkins RD et al. Histone modifications at human enhancers reflect global cell-type-specific gene expression. Nature. 2009; 459:108–12. 10.1038/nature07829. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Graveley BR, Brooks AN, Carlson JW et al. The developmental transcriptome of Drosophila melanogaster. Nature. 2011; 471:473–9. 10.1038/nature09715. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Rivera J, Keränen SVE, Gallo SM et al. REDfly: the transcriptional regulatory element database for Drosophila. Nucleic Acids Res. 2019; 47:D828–34. 10.1093/nar/gky957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Lopez-Delisle L, Rabbani L, Wolff J et al. pyGenomeTracks: reproducible plots for multivariate genomic datasets. Bioinformatics. 2021; 37:422–3. 10.1093/bioinformatics/btaa692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Wang J-W, Beck ES, McCabe BD A modular toolset for recombination transgenesis and neurogenetic analysis of drosophila. PLoS One. 2012; 7:e42102. 10.1371/journal.pone.0042102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Svoboda M, Frost HR, Bosco G Internal oligo(dT) priming introduces systematic bias in bulk and single-cell RNA sequencing count data. NAR Genomics Bioinforma. 2022; 4:lqac035. 10.1093/nargab/lqac035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Bischof J, Maeda RK, Hediger M et al. An optimized transgenesis system for Drosophila using germ-line-specific φC31 integrases. Proc Natl Acad Sci USA. 2007; 104:3312–7. 10.1073/pnas.0611511104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Markstein M, Pitsouli C, Villalta C et al. Exploiting position effects and the gypsy retrovirus insulator to engineer precisely expressed transgenes. Nat Genet. 2008; 40:476–83. 10.1038/ng.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Babraham Bioinformatics FastQC a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 42. Marsh S, Salmon M, Hoffman P samuel-marsh/scCustomize: version 2.1.2. 10.5281/zenodo.10724532. 2024;
- 43. Martin M Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetJ. 2011; 17:10–2. 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 44. Shen W, Le S, Li Y et al. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One. 2016; 11:e0163962. 10.1371/journal.pone.0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Virshup I, Rybakov S, Theis FJ et al. anndata: Access and store annotated data matrices. JOSS. 2024; 9:4371. 10.21105/joss.04371. [DOI] [Google Scholar]
- 46. Schindelin J, Arganda-Carreras I, Frise E et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. 2012; 9:676–82. 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. van der Walt S, Schönberger JL, Nunez-Iglesias J et al. scikit-image: image processing in Python. PeerJ. 2014; 2:e453. 10.7717/peerj.453. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. Dale RK, Pedersen BS, Quinlan AR Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics. 2011; 27:3423–4. 10.1093/bioinformatics/btr539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. Quinlan AR, Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26:841–2. 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Hadley W ggplot2: Elegant Graphics for Data Analysis. 2016; New York: Springer-Verlag. [Google Scholar]
- 51. Ing-Simmons E, Vaid R, Bing XY et al. Independence of chromatin conformation and gene regulation during Drosophila dorsoventral patterning. Nat Genet. 2021; 53:487–99. 10.1038/s41588-021-00799-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. Wolff J, Rabbani L, Gilsbach R et al. Galaxy HiCExplorer 3: a web server for reproducible hi-C, capture hi-C and single-cell hi-C data analysis, quality control and visualization. Nucleic Acids Res. 2020; 48:W177–84. 10.1093/nar/gkaa220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Fisher WW, Li JJ, Hammonds AS et al. DNA regions bound at low occupancy by transcription factors do not drive patterned reporter gene expression in Drosophila. Proc Natl Acad Sci USA. 2012; 109:21330–5. 10.1073/pnas.1209589110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Balasubramanian D, Borges Pinto P, Grasso A et al. Enhancer–promoter interactions can form independently of genomic distance and be functional across TAD boundaries. Nucleic Acids Res. 2024; 52:1702–19. 10.1093/nar/gkad1183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Riddihough G, Ish-Horowicz D Individual stripe regulatory elements in the Drosophila hairy promoter respond to maternal, gap, and pair-rule genes. Genes Dev. 1991; 5:840–54. 10.1101/gad.5.5.840. [DOI] [PubMed] [Google Scholar]
- 56. McKay DJ, Lieb JD A common set of DNA regulatory elements shapes drosophila appendages. Dev Cell. 2013; 27:306–18. 10.1016/j.devcel.2013.10.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Walker JJ, Lee KK, Desai RN et al. The drosophila melanogaster sex determination gene sisA is required in yolk nuclei for midgut formation. Genetics. 2000; 155:191–202. 10.1093/genetics/155.1.191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Imam F, Sutherland D, Huang W et al. stumps, a drosophila gene required for Fibroblast Growth factor (FGF)-directed migrations of tracheal and mesodermal cells. Genetics. 1999; 152:307–18. 10.1093/genetics/152.1.307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Frank LH, Rushlow C A group of genes required for maintenance of the amnioserosa tissue in Drosophila. Development. 1996; 122:1343–52. 10.1242/dev.122.5.1343. [DOI] [PubMed] [Google Scholar]
- 60. Kharchenko PV, Silberstein L, Scadden DT Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014; 11:740–2. 10.1038/nmeth.2967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Qiu P Embracing the dropouts in single-cell RNA-seq analysis. Nat Commun. 2020; 11:1169. 10.1038/s41467-020-14976-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62. Ndajah P, Kikuchi H, Yukawa M et al. SSIM image quality metric for denoised images. 2010; ADVANCES in VISUALIZATION, IMAGING and SIMULATION.
- 63. Nilsson J, Akenine-Möller T Understanding SSIM. 2020; 10.48550/arXiv.2006.13846. [DOI]
- 64. Kühnlein RP, Brönner G, Taubert H et al. Regulation of Drosophila spalt gene expression. Mech Dev. 1997; 66:107–18. 10.1016/S0925-4773(97)00103-2. [DOI] [PubMed] [Google Scholar]
- 65. Markstein M, Zinzen R, Markstein P et al. A regulatory code for neurogenic gene expression in the Drosophila embryo. Development. 2004; 131:2387–94. 10.1242/dev.01124. [DOI] [PubMed] [Google Scholar]
- 66. Schroeder MD, Greer C, Gaul U How to make stripes: deciphering the transition from non-periodic to periodic patterns in Drosophila segmentation. Development. 2011; 138:3067–78. 10.1242/dev.062141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Arnosti DN, Barolo S, Levine M et al. The eve stripe 2 enhancer employs multiple modes of transcriptional synergy. Development. 1996; 122:205–14. 10.1242/dev.122.1.205. [DOI] [PubMed] [Google Scholar]
- 68. Sanyal A, Lajoie BR, Jain G et al. The long-range interaction landscape of gene promoters. Nature. 2012; 489:109–13. 10.1038/nature11279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Ghavi-Helm Y, Klein FA, Pakozdi T et al. Enhancer loops appear stable during development and are associated with paused polymerase. Nature. 2014; 512:96–100. 10.1038/nature13417. [DOI] [PubMed] [Google Scholar]
- 70. Jenett A, Rubin GM, Ngo T-TB et al. A GAL4-driver line resource for Drosophila neurobiology. Cell Rep. 2012; 2:991–1001. 10.1016/j.celrep.2012.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Li S, Ma J, Zhao T et al. CellContrast: reconstructing spatial relationships in single-cell RNA sequencing data via deep contrastive learning. Patterns. 2024; 5:101022. 10.1016/j.patter.2024.101022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Li H, Zhou J, Li Z et al. A comprehensive benchmarking with practical guidelines for cellular deconvolution of spatial transcriptomics. Nat Commun. 2023; 14:1548. 10.1038/s41467-023-37168-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Andersson A, Bergenstråhle J, Asp M et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun Biol. 2020; 3:1–8. 10.1038/s42003-020-01247-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74. Biancalani T, Scalia G, Buffoni L et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat Methods. 2021; 18:1352–62. 10.1038/s41592-021-01264-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Miller BF, Huang F, Atta L et al. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat Commun. 2022; 13:2339. 10.1038/s41467-022-30033-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Zhu M, Li C, Lv K et al. MLSpatial: a machine-learning method to reconstruct the spatial distribution of cells from scRNA-seq by extracting spatial features. Comput Biol Med. 2023; 159:106873. 10.1016/j.compbiomed.2023.106873. [DOI] [PubMed] [Google Scholar]
- 77. Zhao Y, Zhang S, Xu J et al. Spatial reconstruction of oligo and single cells by De Novo coalescent embedding of transcriptomic networks. Adv Sci. 2023; 10:2206307. 10.1002/advs.202206307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Neumann M, Xu X, Smaczniak C et al. A 3D gene expression atlas of the floral meristem based on spatial reconstruction of single nucleus RNA sequencing data. Nat Commun. 2022; 13:2838. 10.1038/s41467-022-30177-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79. Yu Q, Kilik U, Holloway EM et al. Charting human development using a multi-endodermal organ atlas and organoid models. Cell. 2021; 184:3281–3298. 10.1016/j.cell.2021.04.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Legnini I, Emmenegger L, Zappulo A et al. Spatiotemporal, optogenetic control of gene expression in organoids. Nat Methods. 2023; 20:1544–52. 10.1038/s41592-023-01986-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81. Cheng M, Jiang Y, Xu J et al. Spatially resolved transcriptomics: a comprehensive review of their technological advances, applications, and challenges. J Genet Genomics. 2023; 50:625–40. 10.1016/j.jgg.2023.03.011. [DOI] [PubMed] [Google Scholar]
- 82. Yu T, Ekbote C, Morozov N et al. Tissue reassembly with generative AI. bioRxiv17 February 2025, preprint: not peer reviewed 10.1101/2025.02.13.638045. [DOI]
- 83. Wei X, Chen T, Wang X et al. COME: contrastive mapping learning for spatial reconstruction of single-cell RNA sequencing data. Bioinformatics. 2025; 41:btaf083. 10.1093/bioinformatics/btaf083. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All raw data were submitted to ArrayExpress (https://www.ebi.ac.uk/arrayexpress/browse.html) under accession numbers: E-MTAB-14447 (scRNA-seq) and E-MTAB-14445 (targeted PCR amplification). The custom dm6 Drosophila genome used for mapping with CellRanger can be downloaded on Zenodo (https://zenodo.org/records/14006160; DOI:10.5281/zenodo.14006160). The following publicly available databases and datasets were used: FlyBase r6.40 (https://flybase.org/) using the dm6 reference genome, scRNA-seq (GSE95025); DNase I hypersensitivity (SRA:SRX020691, SRA:SRX020692); histone modifications (GSE6273); RNA-seq (SRR1197368, SRR767626, SRR1197336); Micro-C (GEO:GSE171396); Redfly website (http://redfly.ccr.buffalo.edu/).
Code and scripts used for analyses have been deposited on GitLab at https://gitbio.ens-lyon.fr/igfl/ghavi-helm/spatial_scera (DOI:10.5281/zenodo.15577028). The code for the shiny app can be downloaded at: https://gitbio.ens-lyon.fr/igfl/ghavi-helm/spatial-scera-shiny-app-python; and the online version of the application can be accessed with the following link: https://bioshiny.ens-lyon.fr/public/app/spatial-scERA.






