Skip to main content
eLife logoLink to eLife
. 2024 Feb 23;13:e85419. doi: 10.7554/eLife.85419

High-throughput mapping of single-neuron projection and molecular features by retrograde barcoded labeling

Peibo Xu 1,2,†,, Jian Peng 3,, Tingli Yuan 1,, Zhaoqin Chen 1,, Hui He 1,2,, Ziyan Wu 1, Ting Li 3, Xiaodong Li 1,2, Luyue Wang 4, Le Gao 1, Jun Yan 1,5,6, Wu Wei 4,7, Chengyu T Li 1,5,6,7,, Zhen-Ge Luo 3,, Yuejun Chen 1,5,
Editors: Jeremy J Day8, Kate M Wassum9
PMCID: PMC10914349  PMID: 38390967

Abstract

Deciphering patterns of connectivity between neurons in the brain is a critical step toward understanding brain function. Imaging-based neuroanatomical tracing identifies area-to-area or sparse neuron-to-neuron connectivity patterns, but with limited throughput. Barcode-based connectomics maps large numbers of single-neuron projections, but remains a challenge for jointly analyzing single-cell transcriptomics. Here, we established a rAAV2-retro barcode-based multiplexed tracing method that simultaneously characterizes the projectome and transcriptome at the single neuron level. We uncovered dedicated and collateral projection patterns of ventromedial prefrontal cortex (vmPFC) neurons to five downstream targets and found that projection-defined vmPFC neurons are molecularly heterogeneous. We identified transcriptional signatures of projection-specific vmPFC neurons, and verified Pou3f1 as a marker gene enriched in neurons projecting to the lateral hypothalamus, denoting a distinct subset with collateral projections to both dorsomedial striatum and lateral hypothalamus. In summary, we have developed a new multiplexed technique whose paired connectome and gene expression data can help reveal organizational principles that form neural circuits and process information.

Research organism: Mouse

Introduction

Wiring diagrams of a brain can be divided into three levels: (1) the macroscale connectome that describes inter-areal connections, (2) the mesoscale connectome that describes connections between cells, and (3) the microscale connectome that describes connections at the synaptic level (Zeng, 2018). Studying circuit architecture at the level of the mesoscale connectome describes how information flows between brain regions (Oh et al., 2014). Traditionally, neuroanatomical tracers are used to characterize regional connectivity matrices (Cowan, 1998). To obtain cell-type-specific connectivity, one can use recombinant virus-based tracer in transgenic model organisms or more precisely trace a specific component of a neural circuit using viral-genetic tracing tools to dissect the input-output organization (Ghosh et al., 2011; Nassi et al., 2015; Schwarz et al., 2015). However, these methods are highly reliant on complex recombinant virus design and genetically modified model organism, and often are not at a single-neuron resolution.

While recent advances have brought invaluable insights into understanding neuronal circuits at single-neuron resolution, existing methods have limitations. High-throughput fluorescence imaging, such as fluorescence micro-optical sectioning tomography (fMOST), can reconstruct detailed neuron morphologies but requires specialized expertise and equipment and lack transcriptomic information (Gong et al., 2016; Rompani et al., 2017). Barcode-based methods like MAPseq, BRICseq (multiplexed MAPseq), BARseq, and ConnectID utilize sequencing to map projections (Chen et al., 2019; Huang et al., 2020; Kebschull et al., 2016; Klingler et al., 2021). However, MAPseq and BRICseq can only provide connectome information (Huang et al., 2020; Kebschull et al., 2016), BARseq is constrained to assessing a handful of genes via in situ hybridization (Chen et al., 2019), and ConnectID has low recovery of cells with dual connectome-transcriptome data (~16%, 391 cells with connectome barcode identity in 2450 cells with scRNA-seq; Klingler et al., 2021). VECTORseq, a Retro-seq-based method (Tasic et al., 2018), is limited by its number of transgenic barcodes used (Cheung et al., 2021). The updated BARseq protocol enables detection of up to 100 genes, but throughput remains lower and oligo synthesis costs remain higher compared to scRNA-seq (Chen et al., 2023; Sun et al., 2021). In summary, despite significant progress, existing methods fall short in efficiently integrating high-throughput projectomes and transcriptomes at the single-neuron level, hindering a comprehensive understanding of the connectomic and transcriptomic interplay in neuronal circuitry.

Medial prefrontal cortex (mPFC) is an intricate brain region involved in higher order cognitive functions, information processing (e.g., memory and emotions) and driving goal-directed actions (Le Merre et al., 2021). For example, mPFC neurons projecting to the nucleus accumbens encoding punishment-related internal states were located in more superficial layer 5a, and mPFC neurons projecting to the ventral tegmental area encoding aversive learning were located in deeper layer 5b (Kim et al., 2017; Wu et al., 2021). Although previous studies have extensively investigated the anatomical and functional diversities of mPFC, the relationship between anatomical and molecular features of mPFC neurons remains elusive. Do mPFC neurons projecting to different downstream brain regions differ in their transcriptomes? Are these projection-defined mPFC neurons homogeneous or composed of different neuron subtypes? The answer to these questions may be further complicated by the finding that mPFC neurons can send collateral axons to multiple brain regions (Cornwall and Phillipson, 1988). So, what are the principles of target selection or target combination for these collateral projection mPFC neurons? What are the cell type and molecular features of these ‘broadcasting’ neurons?

To address these challenges, we designed a multiplexed tracing method capable of characterizing single-neuron transcriptome and projectome at the same time, which we called MERGE-seq (Multiplexed projEction neuRons retroGrade barcodE). We used MERGE-seq to interrogate the projectome and the corresponding transcriptome of ventral mPFC neurons. We injected five rAAV2-retro viruses with distinct barcodes into the five known downstream targets of ventromedial prefrontal cortex (vmPFC), including agranular insular cortex (AI), dorsomedial striatum (DMS), basolateral amygdala (BLA), mediodorsal thalamic nucleus (MD), and lateral hypothalamus (LH), in the same mouse brain such that each target region received a unique barcoded rAAV2-retro. We found that vmPFC neurons projecting to each downstream target are heterogeneous, which are composed of transcriptionally different subtypes of neurons. Approximately 65% of barcoded vmPFC neurons exhibited dedicated projection patterns based on MERGE-seq data, sending axonal projections exclusively to one of the five selected targets. It is important to note that this characterization of ‘dedicated projection’ neurons is specifically defined in the context of the five target regions examined in this study. Approximately 35% of barcoded vmPFC neurons sent collateral projections to multiple brain regions, most of which are dual-target projection neurons (bifurcated projection). We further uncovered the cell type compositions and layer distributions of these dedicated and collateral projection vmPFC neurons, and revealed their molecular signatures. We validated complex MERGE-seq-inferred projection patterns by joint analysis with recently published single-neuron projectome data (Gao et al., 2022). Additionally, dual-modal interrogation using RNA fluorescence in situ hybridization (FISH) and dual-color retrograde AAV labeling allowed us to confirm vmPFC neuron bifurcations to DMS and LH, demonstrating layer 5 Pou3f1+ neurons collateralize between these targets. Finally, we implemented a machine learning-based methodology and uncovered specific gene clusters for predicting certain projection patterns. As MERGE-seq bridges the gap between single-neuron projectome and transcriptome data, it can uncover new molecular properties of anatomical neural circuits.

Results

MERGE-seq characterizes single neuron transcriptome and projectome simultaneously

In order to use the 10x Genomics scRNA-seq system to analyze transcripts from cells infected with rAAV2-retro virus, we modified the viral vector by adding a 15 bp barcode index and polyadenylation signal sequences to the 3’ end of the EGFP sequences, which was driven by a short CAG promoter (Figure 1A and B, see Materials and ethods). Then, five rAAV2-retro viruses with different barcodes were individually injected into five brain regions of the same mouse, including AI, DMS, BLA, LH, and MD. These brain areas are the known downstream brain regions of vmPFC (Hunnicutt et al., 2016; Hurley et al., 1991; Reppucci and Petrovich, 2016; Vertes, 2004; Zhu et al., 2020). A period of six weeks was set to allow efficient retrograde labeling of vmPFC neurons by these barcoded viruses. These mice were then sacrificed and the vmPFC (specifically the prelimbic area [PrL] and the infralimbic area [IL]) was carefully dissected for scRNA-seq analysis (Figure 1A). Single-cell transcriptional libraries were obtained using 10x Genomics library preparation protocols, and virus barcode expression libraries were obtained using user-defined primers, which could enrich cDNA fragments composed of barcode index, unique molecular identifiers (UMIs), and the cell barcode (Figure 1B). We detected 24,788 cells in the raw data matrix. Following initial quality control, which ensured the number of detected RNA in each cell ranged between 500 and 8000, RNA UMI counts in each cell were within 1000–60,000, and the percentage of mitochondrial genes remained below 20%, we recovered 1791 cells undergoing fluorescence-activated cell sorting (FACS) from three mice and 19,470 single cells without sorting from the other three mice, a total of 21,261 cells. Transcriptional profiling of all cells revealed major cell types including excitatory neurons (Slc17a7+), microglia (C1qa+), endothelial cells (Itm2a+, Endo), oligodendrocyte progenitor cells (Olig2+Mog-, OPCs), oligodendrocyte (Olig2+Mog+, Oligo), inhibitory neurons (Gad1+), astrocyte (Aldh1l1+, Astro), and activated microglia (C1qa+Pf4+, Act. Microglia) as previously reported (Bhattacherjee et al., 2019; Figure 1C–E). Barcoded cells below refer to a collection of barcoded cells from unsorted group and FAC-sorted group.

Figure 1. MERGE-seq characterized single-neuron transcriptomes and projectomes simultaneously.

(A) Schematic diagram of the experimental workflow. (B) rAAV2 plasmid vector design, and schematic of designed primers to recover cell barcode and UMI in read 1, and 3’ tail of EGFP and virus barcode in read 2. According to the recommendation of 10x Genomics, a faithful mapping should cover 28 bp for read 1 and 91 bp for read 2. In our design, 150 bp pair-end sequencing can sufficiently meet the need to recover cell barcode, UMI and virus barcode. (C) Umap embedding of transcriptional clustering results for all vmPFC cells. (D) Stacked violin plots showing the expression of markers for each cluster. (E) Heatmap showing the gene-expression correlation between major cell types defined by scRNA-seq of this study and Bhattacherjee et al., 2019. (F) Umap embedding of all determined barcoded cells labeled in blue. (G) Bar plot showing frequency of barcoded (blue) and non-barcoded (grey) cells in all recovered cell types. In (C–E), 21,261cells were represented. In (F, G), 20,047cells were represented. 1214 cells with exceptionally high nUMIs were removed.

Figure 1.

Figure 1—figure supplement 1. Validation of rAAV2-retro injection sites and determination of valid barcoded cells.

Figure 1—figure supplement 1.

(A) The position of injection sites (AI-1, AI-2, DMS, LH, MD, BLA) to deliver rAAV2-retro-EGFP plotted on coronal section diagrams (top). The left corner values indicate the anteroposterior distance of the section from bregma. Representative immunohistochemistry images showing rAAV2-retro-EGFP injection sites in coronal sections (bottom). Scale bars: 500 μm. AI-1, agranular insular cortex, anterior; AI-2, agranular insular cortex, posterior. (B) Frequency of hamming distance (HD) of five reference barcode sequences. (C) Density distribution of HD of best and second-best hit when comparing barcode sequences form reads to and five barcode references. The distance to the best hit (identified barcode) is 0 across four samples as we used perfect match for the identification. And the distance of the second best hit is ~10, showing that there are sufficient sequence difference to identify the right barcode among five references during sequencing. (D) Boxplot displaying the count distribution of UMIs for neurons with projections identified by different barcodes and a sum of all five barcodes across projectome and scRNA-seq libraries. (E) Density plots contrasting the distribution of UMIs for each target region between non-neuronal cells (green), EGFP-negative cells (blue), and FAC-sorted cells (red). The dashed line indicates a threshold for UMI counts of a barcoded neuron. Note that EGFP-negative cells are subset of FAC-sorted group and determined as nUMI of EFGP RNA = 0. (F) Violin plots of Log10 normalized projection index barcode counts. Note that after choosing the UMI counts threshold, UMI counts below threshold were dropped to zero. (G) Stacked bar plot showing barcoded and non-barcoded cell ratio in FAC-sorted group or unsorted group as determined by stringent UMI threshold. (H) Stacked bar plot showing EGFP-positive (nUMI of EFGP RNA >0) and EGFP-negative cells (nUMI of EFGP RNA = 0) ratio in FAC-sorted group or unsorted group as determined by scRNA-seq.

First, we validated that each target region was labeled and effectively covered by the rAAV2-retro-EGFP (Figure 1—figure supplement 1A). Next, we showed that there were sufficient sequence differences to distinguish one barcode from others and sufficient sequences difference to identify the right barcode among five references during sequencing (Figure 1—figure supplement 1B, C). Since each downstream brain region of vmPFC received a unique and predetermined barcoded virus, each virus barcode identified in a vmPFC neuron represents the specific corresponding downstream brain region that the neuron projects to. We found abundant zero counts for projection barcodes in scRNA-seq libraries, contrasting robust detection in projectome libraries generated by targeted amplification from full-length cDNA (Figure 1—figure supplement 1D). To determine validly barcoded cells, we first calculated the 95th nUMI percentile across all barcodes and removed outlier cells with exceptionally high nUMI (see Materials and methods). We used ‘EGFP-negative’ FAC-sorted cells (defined by nUMI EGFP = 0) and non-neuronal cells from scRNA-seq as negative controls to calculate 99.9th percentile UMI thresholds per barcode using empirical cumulative distribution functions (ECDF; Figure 1—figure supplement 1E). By taking the higher threshold for each barcode from these two negative control analyses, we classified cells exceeding these values as validly barcoded. It is worth mentioning that the UMI threshold differs for different targets due to different magnitude of barcode expression of each projection target (Figure 1—figure supplement 1F). Across all detected cell types, barcoded cells were primarily excitatory neurons rather than inhibitory neurons or non-neuronal cell types (2116 validly barcoded in 8805 excitatory neurons, and 5 validly barcoded in 2738 endothelial cells, 3 validly barcoded in 1780 oligodendrocyte progenitor cells, 7 validly barcoded in 1773 oligodendrocytes, and 17 in 1420 inhibitory neurons, Figure 1F and G). This is consistent with the finding that mPFC projection neurons are excitatory (Gabbott et al., 2005). Using this stringent threshold, 49.0% of FAC-sorted and 18.7% of unsorted cells were classified as barcoded (Figure 1—figure supplement 1G). In parallel, we calculated EGFP+ ratios (nUMI of EGFP RNA >0) as 81% for FAC-sorted and 26% on average for unsorted cells (Figure 1—figure supplement 1H). The lower fraction of barcoded versus EGFP+ cells suggests our conservative threshold increases false negatives, classifying some low UMI cells as non-barcoded. Therefore, we focused analyses on reliably barcoded cells, though conclusions may not capture the full heterogeneous projection repertoire. Together, these results demonstrate that MERGE-seq can record single neuron transcriptome and projectome simultaneously.

MERGE-seq reveals transcriptomic heterogeneity and cell type composition of vmPFC neurons projecting to different targets

Previous studies have shown that vmPFC neurons project to multiple brain regions including AI, DMS, BLA, LH, and MD; however, the cell type composition of these projection neurons remains largely unknown (Le Merre et al., 2021). Combining with single neuron transcriptome, we explored the transcriptome and subtype composition of vmPFC neurons projecting to different downstream brain regions. We first re-clustered excitatory projection neurons expressing Slc17a7 (also known as vesicular glutamate transporter, Vglut1). Clusters with low gene/UMI counts and high mitochondrial gene expression were filtered out as low-quality (Ilicic et al., 2016). Some clusters exhibited non-neuronal cell markers like microglial genes (C1qa, C1qb), oligodendrocyte genes (Olig1, Olig2), and endothelial cell genes (Flt1, Cldn5) despite small cluster size, indicating contamination from other cell types incorrectly grouped within excitatory neurons after initial clustering. In total, we filtered out 637 cells that were identified as either low-quality or contaminated with non-neuronal cell types and recovered 9368 excitatory neurons (see Materials and methods, Figure 2—figure supplement 1A, Supplementary file 1). We generated seven excitatory neuron clusters, which were annotated based on typical markers of cortical layers (Bhattacherjee et al., 2019; Sorensen et al., 2015; layer 2/3, Cux2; layer 5, Etv1; layer 6, Sulf1) and differentially expressed genes (DEGs; Supplementary file 2). These neuron clusters include L2/3-Calb1 (4.1%), L2/3-Rorb (5.9%), L5-Bcl6 (3.3%), L5-Htr2c (3.9%), L5-S100b (11.6%), L6-Npy (12.6%), and L6-Syt6 (58.7%; Figure 2A). The layer and subtype marker genes of these clusters were confirmed to be expressed in corresponding layers in the vmPFC, as revealed by in situ hybridization results of the Allen Mouse Brain Atlas (Figure 2A, Figure 2—figure supplement 1A–C). Of note, we captured more layer 6 neurons than superficial layer neurons (12.6% L6-Npy and 58.7% L6-Syt6, Figure 2B), which is different from a previous report (Bhattacherjee et al., 2019). We speculate that different dissociation protocols may cause biased neuron capture.

Figure 2. MERGE-seq unravels transcriptomic heterogeneity of projection target-defined vmPFC neurons.

(A) Umap embedding of excitatory neuron subtype annotation. (B) Bar plot showing frequency of barcoded (blue) and non-barcoded (grey) neurons in distinct neuron subtypes. (C) Stacked violin plot showing the expression of markers for each neuronal subtype. (D) Heatmap showing the gene-expression correlation between excitatory subtypes defined by Multiplexed Error-Robust Fluorescence in situ Hybridization (MERFISH) and scRNA-seq. MERFISH data were downloaded from Bhattacherjee et al., 2023. (E) Umap embeddings of barcoded (blue) neurons projecting to each target. Number indicates the number of barcoded cells for each target. (F) Bar plot describing the distribution of neuronal subtypes for barcoded neurons associated with each projection target. Neuronal subtype color codes are the same as in (A), number of barcoded cells are same as the number indicated in (E) for each target. (G) Bar plot describing the distribution of projection targets for barcoded neurons associated with each neuronal type. In (A, C, D), 9368 cells in total were represented. In (B, E, F), 8210 cells in total were represented. In (G), cell numbers represented are as follows: L2/3-Calb1=72 cells, L2/3-Rorb=331 cells, L5-Bcl6=145 cells, L5-S100b=766 cells, L6-Npy=526 cells, L6-Syt6=1264 cells.

Figure 2.

Figure 2—figure supplement 1. Layer and cluster annotation using the mouse brain atlas and published scRNA-seq transcriptomes, and projection patterns per mouse.

Figure 2—figure supplement 1.

(A) Normalized Slc17a7 (vGlut1) expression for all extracted excitatory neurons. (B) In situ hybridization of typical layer-specific markers within the vmPFC region from the Adult Mouse Brain Atlas. Cux2 is layer 2/3-specific; Etv1 is layer 5-specific; Sulf1 is layer 6-specific (left). Normalized expression of Cux2, Etv1, and Sulf1 at umap embedding (right). (C) In situ hybridization of typical neuronal subtype markers in the vmPFC from the Adult Mouse Brain Atlas. Calb1 and Rorb are layer 2/3-specific; Htr2c and S100b are layer 5-specific; Bcl6 is around the transition of layer 2/3 and layer 5. Syt6 and Npy are layer 6-specific, though Npy is distributed sporadically. Corresponding normalized gene expression embedded in umap is plotted in the right panel. (D) Heatmap showing the gene-expression correlation between excitatory subtypes defined by scRNA-seq of this study and (Lui et al., 2021), (left) or (Yao et al., 2021), (middle) or (Bhattacherjee et al., 2019), (right). (E) Stacked bar plots showing neuronal subtype composition of pooled unsorted mice and pooled FAC-sorted mice for each projection target. Statistical approach was not applied due to the limitations of having a single observation per cluster per group.

Cells that were retrogradely barcoded spanned all layers of the vmPFC (layer 2/3, 5, and 6) and included all seven neuronal subtypes (Figure 2A–C). These subtypes were highly corresponding to the spatially resolved PFC excitatory neuronal subtypes (Bhattacherjee et al., 2023; see Materials and methods, Figure 2D). High correlation allows us to infer the spatial localization of our annotated subtypes detected in scRNA-seq data. We also found that excitatory neuronal subtypes are transcriptionally similar to those previously reported (Figure 2—figure supplement 1D; Bhattacherjee et al., 2019; Lui et al., 2021; Yao et al., 2021). All these integrated analyses suggest that multiple viral infections will not significantly affect the transcriptional state of these retrogradely labeled vmPFC neurons. For the L5-Htr2c subtype, only nine neurons were recovered with valid barcodes, possibly due to cell loss during single-cell dissociation or tropism of AAV2-retro, or these neurons may intrinsically not project to any target we chose (Figure 2B). Neurons projecting to DMS were abundant (n=1242), whereas neurons projecting to BLA were rare (n=163; Figure 2E). These results are consistent with data acquired via conventional fluorescence-based retrograde tracing in the prefrontal cortex of rats (Gabbott et al., 2005).

We next calculated the subtype composition of vmPFC neurons projecting to each downstream brain region. Interestingly, we found that these target specific projection neurons were transcriptionally heterogeneous, which were composed of different neuronal subtypes (Figure 2F). Neurons projecting to LH or MD were mainly L6-Syt6 subtype, whereas neurons projecting to AI, DMS, or BLA were mainly composed of L5-S100b, and to a lesser extent L6-Npy and L2/3-Rorb subtypes (Figure 2F). It is worth noting, based on the observed ratios, that the cellular composition of target-specific projection neurons from FAC-sorted or unsorted groups is similar (Figure 2—figure supplement 1E).

As the layer distribution of each neuron subtype can be inferred by their layer specific marker gene expression, these results also implied the layer distribution of neurons projecting to each target (Figure 2E, Figure 2—figure supplement 1B). By calculating the projection properties of each vmPFC neuron subtype, we found that each transcriptome-defined neuron subtype can project to specific but multiple targets. For instance, L5-S100b, L6-Npy and L2/3-Rorb mainly projected to AI, DMS and BLA, while L6-Syt6 mainly projected to MD and LH (Figure 2G). Interestingly, we also found that different neuron subtypes localized in the same layer could project to distinct targets. For instance, L6-Npy neurons projecting to AI, DMS, and BLA, while L6-Syt6 neurons projecting to MD, DMS, and LH (Figure 2G). Similar phenotypes were observed for L5-S100b and L5-Bcl6 subtypes (Figure 2G), suggesting transcriptomic and projection/function diversities in the spatially close neurons within the same cortical layer.

Together, by MERGE-seq analysis, we have revealed the heterogeneity and cellular composition of vmPFC neurons projecting to different target. Our results demonstrate that vmPFC neurons projecting to a certain target are composed of different transcriptome-defined neuronal subtypes, and individual transcriptome-defined subtypes of vmPFC neuron project to multiple targets.

MERGE-seq reveals dedicated and collateral projection patterns of vmPFC neuron at single cell level

Interestingly, we found that a portion of barcoded vmPFC neurons had more than one type of barcode, suggesting collateral projection of these neurons. We therefore analyzed the projection pattern of each barcoded vmPFC neuron by calculating the number of valid barcode types (see Materials and methods). We defined the dedicated projection neuron as a neuron containing only one type of barcode, the collateral projection neuron as a neuron containing more than one type of barcode. We found 64.88% of 2050 viral-barcoded neurons belonged to dedicated projection and the remaining belonged to collateral projection. A total of 23.37% had dual targets (bifurcated projection), 8.15% had triple targets, and 3.61%, if any, projected to more than three targets (Figure 3A). It is worth mentioning that the definition of ‘dedicated’ and ‘collateral’ projections relies solely on the analysis of MERGE-seq data. The quantitative resolution of dedicated and collateral projections of vmPFC neurons will depend on the comprehensiveness of retrograde labeling from all postsynaptic targets and labeling efficiency. By calculating the conditional probability that the same neuron projects to two targets (see Materials and methods), we found that vmPFC neurons projecting to AI or BLA were more likely to have collateral projection to DMS (Figure 3B). We also observed a relatively high conditional probability of collateral projection between MD and LH, or DMS and LH, or DMS and MD (Figure 3B), suggesting bifurcated projections to these paired targets for single vmPFC neuron.

Figure 3. MERGE-seq reveals projection diversity within the vmPFC.

(A) Pie chart indicating the number of projection targets for barcoded vmPFC neurons recovered by MERGE-seq. (B) Heatmap showing the probability that a neuron projecting to area A also projects to area B. (C) Bar graph illustrating the percentage of neuronal projection pattern of all projection patterns given five projection targets inferred by MERGE-seq (red bars) versus the 1155 fMOST-based single-neuron projectome data (blue bars) (Gao et al., 2022). (D) Boxplot comparison of percentage of neurons with different projection targets identified by MERGE-seq and fMOST. (E) Heatmap showing normalized projection strength. Rows represent the projection targets and columns represent the cells labeled by the top 10 binary projection patterns or labeled by transcriptional neuron subtypes. (F) Alluvial plot showing the 10 most frequent projection patterns distribution into neuronal subtypes. (G) Pie charts describing the projection patterns from (E) partitioned by neuronal subtype. In (A, B), 2050 barcoded neurons were represented. In (C, D), 2050 barcoded neurons from MERGE-seq data were represented, 1155 cells with fMOST data were represented (Gao et al., 2022). In (E–G), 1853 barcoded neurons (top 10 frequent projection patterns) were represented.

Figure 3—source data 1. Related to Figure 3C, D, F and G.
Files contain raw data for Figure 3C and D, statistical summary for Figure 3D, raw data for Figure 3F and G.

Figure 3.

Figure 3—figure supplement 1. Immunostaining of dual-color, retrogradely labeled neurons and quantification, PCA plot of projection clusters.

Figure 3—figure supplement 1.

(A–L) Immunostaining of dual-color traced retrograde labeled neurons of selected targets DMS (GFPnls) /LH (tdTomato), AI (GFP) /DMS (tdTomato), DMS (GFPnls) /BLA (tdTomato), and BLA (GFP) /LH (tdTomato). (A, D, G, J). Dotted line depicts layers 2/3, 5, and 6 of the vmPFC. Scale bars, 500 µm. (B, E, H, K) Enlarged view of the dotted box in (A, D, G, J). Scale bars, 100 µm. (C, F, I, L) Histogram shows quantitative data for single- (red, green) and double- (yellow) labeled neurons as mean percentages of total rAAV2-retro labeled neurons (n=3 mice). Data are presented as mean ± SD. Pie chart showing layer distribution of double (yellow) labeled neurons. (M) Normalized projection index barcode expression on PC1 and PC2 embeddings and binary projection annotation labeled on PC1 and PC2 embeddings. Note that only the 10 most frequent binary projection patterns were included. Data are the mean ± SD.

We first validate the bifurcated projection patterns (2 targets) inferred from the digital projectome. We injected retrograde AAV2 encoding different fluorescent proteins (EGFP or tdTomato) into different combinations of projection targets (dual-color rAAV2-retro labeling assay), and analyzed the projection patterns by immunohistochemistry. Consistent with MERGE-seq identifying DMS + LH bifurcated projections (Figure 3B), dual-color labeling revealed 17.8% ± 0.11% of vmPFC neurons collateralize to DMS and LH (Figure 3—figure supplement 1A–C). Of these, 73.28% ± 7.60% localized to layer 5 (Figure 3—figure supplement 1A–C). Other bifurcated projection patterns inferred by MERGE-seq was also verified by our dual-color retro-AAV labeling assay. These patterns included DMS + AI (23.1% ± 2.03% of all dual-color neurons) and DMS + BLA (6.59% ± 1.55%) (Figure 3—figure supplement 1D–I). In contrast, we only observed 1.66% ± 0.92% of dual-color labeled neurons in BLA + LH group (Figure 3—figure supplement 1J–L). This result is consistent with our MERGE-seq analysis, in which BLA + LH was not inferred as bifurcated projection targets (Figure 3B), further supporting the accuracy of the digital projectome based on MERGE-seq analysis.

Since dual-color labeling can only validate two targets, we additionally validated inferred projections by quantifying MERGE-seq patterns as percentages of totals and comparing to published single-neuron PFC projectome data (Gao et al., 2022). We found that DMS, AI + DMS, MD, and LH projection pattern appear as the most frequent projection patterns in both studies, with a relatively higher percentage of DMS dedicated projection pattern in MERGE-seq data (Figure 3C). We further categorized projection patterns by number of targets and found no significant differences versus imaging-based reconstruction (Figure 3D), indicating MERGE-seq faithfully identifies projection patterns.

Next, we focused our analysis on the 5 dedicated projections (DMS, AI, MD, LH, and BLA) and most frequent five collateral projections (DMS + AI, DMS + MD, DMS + LH, DMS + AI + MD, and DMS + AI + MD + LH). We conducted a principal component analysis (PCA) of the projection matrix and mapped binary projection labels on PC embeddings. Results from binary projection clustering aligned well with clusters at PC1- and PC2-defined embeddings (Figure 3—figure supplement 1M). We further clustered cells according to projection strength (defined as normalized projection barcode UMI counts; Figure 3E). We found that cells exhibited collateral projections to DMS + AI, or DMS + MD, or DMS + LH, or DMS + AI + MD, or DMS + AI + MD + LH (Figure 3E), a pattern very similar to that we observed in binary projection model, indicating that projection strength-based clustering is comparable to binary projection pattern model (Figure 3B). We next explored the cell type composition of the top 10 dedicated or collateral projection neurons. We mapped transcriptomic clusters to projection patterns (Figure 3F). While dedicated and collateral projection neurons were largely transcriptionally diverse (≥3 subtypes, Figure 3G), certain projections like MD-projecting and DMS + MD-projecting were highly homogeneous, composed of >90% L6-Syt6 cells (Figure 3G).

Overall, MERGE-seq elucidated dedicated and collateral vmPFC neuron projections at the single-neuron level, demonstrating diversity in projection patterns within individual vmPFC neurons. Furthermore, projection-defined (collateral or bifurcated) neurons have specific cell type composition and layer distributions. It is worth noting that as a proof of concept, we only acquired the vmPFC projectome from five downstream targets. Definitions of dedicated or collateral projections are thus limited to these five targets and some collateral projections may be underestimated.

Transcriptional profiling of projection target-specific vmPFC neurons

Next, we sought to determine the molecular features of neurons projecting to different downstream targets. We calculated DEGs for each target-specific projection neurons (Figure 4A, Figure 4—figure supplement 1). We found that some of projection-specific DEGs are marker genes of typical neuronal types. For example, Syt6, Foxp2, and Cyr61 are both MD-projecting DEGs and marker genes of L6-Syt6 neurons; Rorb and Slc24a3 are both DMS-projecting DEGs and marker genes of layer 2/3 neurons (neuronal subtype L2/3-Rorb; Figure 2C, Figure 4A, Figure 4—figure supplement 1).

Figure 4. Transcriptional profiling of projection target-specific vmPFC neurons.

(A) Volcano plots DEGs of MD-projecting versus non-MD-projecting vmPFC neurons. Assigned DEGs (red dots) were determined using threshold: Log2 fold change = 0.5, p value cutoff=10–10. (B) Immunostaining of EGFP (MD) and tdTomato (LH), and RNA FISH of Syt6. (i, ii) Enlarged view of dotted box in (B). (i) represents typical view at layer 6 and (ii) represents typical view at layer 5. Arrow head indicates Syt6+EGFP+ neurons. (C) Quantifications of (B). (B) Scale bars, 200 μm. i, ii in (B) Scale bars, 50 μm. N=3 mice. Data are presented as mean ± SD. In (A), 8210 cells were represented.

Figure 4—source data 1. Related to Figure 4A, Figure 4—figure supplement 1.
Files contain the raw differentially differentially expressed genes (DEGs) list used for plotting volcano plots. For each target, DEGs were calculated using target-barcoded cells against non-target-barcoded cells (e.g., AI barcoded versus non-AI-barcoded, DMS barcoded versus non-DMS-barcoded, etc.). MAST algorithm was used to do DE testing.
elife-85419-fig4-data1.xlsx (412.6KB, xlsx)
Figure 4—source data 2. Related to Figure 4B and C.
Files contain the raw quantitative data of immunostaining results.

Figure 4.

Figure 4—figure supplement 1. Transcriptional profiling of projection target-specific vmPFC neurons.

Figure 4—figure supplement 1.

Volcano plots DEGs of DMS-projecting versus non-DMS-projecting vmPFC neurons, AI-projecting versus non-AI-projecting vmPFC neurons, BLA-projecting versus non-BLA-projecting vmPFC neurons, and LH-projecting versus non-LH-projecting vmPFC neurons. Assigned DEGs (red dots) were determined using threshold: Log2 fold change = 0.5, p value cutoff=10–10.

We further validated the molecular features of neurons associated with their specific projections by combining RNA fluorescence in situ hybridization (FISH) and retrograde labeling. Syt6 is one of the DEGs of MD-projecting neurons (Figure 4B), and is the marker gene of L6-Syt6 cluster. By retrograde labeling of MD-projecting neurons and Syt6 FISH experiment, we found that about 51.6% ± 16.9% Syt6+ neurons project to MD. Further statistical analysis showed that, among Syt6+ MD-projecting (Syt6+EGFP+) neurons, 84.2% ± 14.8% were located in layer 6 while 15.8% ± 14.8% were located in layer 5 (Figure 4C), similar to the pattern obtained in our MERGE-seq analysis (Figure 2F). These results are in accordance with single-neuron projectomic and transcriptomic analysis of MERGE-seq, indicating that MERGE-seq can faithfully reveal the transcriptomic features of projection-specific neurons.

MERGE-seq uncovers the molecular features of collateral projection neurons in vmPFC

Axons of projection neurons, including vmPFC neurons, have highly complex collaterals, which could regulate information processing and neural response properties at the microcircuit level (Gagnon and Parent, 2014; Gao et al., 2022; Rockland, 2019). However, the molecular features of neurons sending collateral projections remain elusive. MERGE-seq provides an opportunity to explore. Here, we identified DEGs for neurons with dedicated and collateral projection pattern (Figure 5A). Next, we asked whether there was transcriptional difference between neurons with dedicated projection to A and neurons with bifurcated projection to A and B. DEGs were rare in comparisons between projection patterns A/B vs. A, or A/B vs. B in all of groups we tested, except for the DMS + LH group and DMS + MD group (Figure 5B and C,Figure 5—figure supplement 1). We found that DMS + LH projection neurons were transcriptionally distinct to DMS but similar to LH, and DMS + MD neurons were transcriptionally distinct to DMS but similar to MD (Figure 5B and C,Figure 5—figure supplement 1). Specifically, we identified a set of genes which differentially expressed in DMS + LH projection neurons (such as Pou3f1, Igfbp4, and Gprc5b) or DMS + MD projection neurons (such as Rprm, Crym, Hs3st4 and Bc1). Interestingly, Pou3f1 is marker gene of L5-Bcl6 neurons (layer 5 neuron subtype), representing one of the two distinct neuron subtypes within the DMS + LH projection neuronal population (Figure 3G). We next verified the specific gene expression in DMS + LH projection neurons by using RNA FISH in combination with dual-color retrovirus labeling assay (Figure 5D). We found that the expression of Pou3f1 was mainly distributed in layer 5, where Pou3f1 was specifically expressed in dual-color labeled DMS + LH projecting neurons (white arrowheads, Figure 5E) and LH projecting neurons (white arrows, Figure 5E), but not DMS projecting neurons (blue arrows, Figure 5E). Quantification analysis showed that, among Pou3f1+ neurons, there are 55.7% ± 10.4% DMS + LH-projecting (Pou3f1+EGFP+tdT+) neurons, 31.6% ± 13.1% dedicated LH-projecting (Pou3f1+EGFP-tdT+) neurons, and 8.89% ± 2.38% dedicated DMS-projecting (Pou3f1+EGFP+tdT-) neurons (Figure 5G). We additionally discovered that 3.79% ± 2.91% of Pou3f1+ neurons did not project to either DMS or LH (Pou3f1+EGFP-tdT-) (yellow arrowheads, Figure 5F, Figure 5G). These results are consistent with our observation based on MERGE-seq data (Figure 3G).

Figure 5. Molecular features of single vmPFC neuron with collateral projections to downstream targets.

(A) Heatmap showing scaled expression of calculated DEGs based on 10 projection patterns. Top 10 DEGs ordered by average log2 fold change of each pattern were selected. (B) Volcano plot showing genes differentially expressed in the DMS + LH-bifurcated projection pattern compared to the DMS-dedicated projection pattern. (C) Track plots showing normalized data of the selected DEGs in DMS-dedicated, LH-dedicated, and DMS + LH-bifurcated projection pattern. (D–F) Examining Pou3f1 and DMS + LH-bifurcated projection pattern using RNA FISH and immunostaining of dual-color traced retrograde labeled neurons. Virus injection scheme was the same as in Figure 3—figure supplement 1. Scale bars, 200 µm. (E, F) Enlarged view of dotted box in (D). Arrow heads indicate Pou3f1+EGFP+tdTomato+ neurons, white arrows indicate Pou3f1+EGFP-tdTomato+ neurons, blue arrows indicate Pou3f1-EGFP+tdTomato- neurons, and yellow arrowheads indicate Pou3f1+EGFP-tdTomato- neurons. Scale bars, 50 µm. (G) Quantification of (D). N=3 mice, Data are presented as mean ± SD. In (A), 1,853 barcoded neurons (top 10 frequent projection patterns) were represented. In (C), 805 barcoded neurons (projection pattern DMS + LH = 35, LH = 176, DMS = 594) were represented.

Figure 5—source data 1. Related to Figure 5B.
Files contain the raw differentially differentially expressed genes (DEGs) list used for plotting volcano plots. DEGs were calculated using DMS-barcoded cells against DMS + LH-barcoded cells. MAST algorithm was used to do DE testing.
elife-85419-fig5-data1.xlsx (215.9KB, xlsx)
Figure 5—source data 2. Related to Figure 5D–G.
Files contain the raw quantitative data of immunostaining results.

Figure 5.

Figure 5—figure supplement 1. DEGs between dedicated projection neurons versus bifurcated neurons.

Figure 5—figure supplement 1.

(A) Volcano plots of DEGs calculated between A and A/B projection patterns. See also Figure 5. Assigned DEGs (red dots) were determined using threshold: Log2 fold change = 0.5, p value cutoff=10–5. (B) Track plots showing normalized data of the selected DEGs (DMS versus DMS + MD projection) in DMS-dedicated, MD-dedicated, and DMS + MD-bifurcated vmPFC neurons.

Together, by MERGE-seq analysis and experimental validation, we uncovered that Pou3f1 predominantly marks neurons projecting to the LH, denoting a distinct subset with collateral projections to both DMS and LH.

Machine learning-based modeling reveals gene clusters for predicting projection patterns

Although many efforts have been made to correlate gene expression with neuronal circuit connectivity (Huang et al., 2020; Sorensen et al., 2015; Sun et al., 2021), the lack of a shared coordinate system for two modalities or limited genes examined reduces the prediction precision. MERGE-seq overcomes these challenges by acquiring high-throughput gene expression and projection pattern in the same neuron (Figure 6A). To evaluate potential relationships between the transcriptome and projectome, we used a probabilistic classifier, Naïve Bayes classifier, to predict binary projection patterns for each projection target based on transcription profiles. First, we encoded binary projection labels for each target region, encompassing both barcoded and non-barcoded projections, and subsequently trained a separate set of models for each of the five targets: AI, DMS, BLA, LH, and MD (see Materials and methods). Subsequently, we conducted a systematic evaluation of the impact of varying numbers of highly variable genes (HVGs), ranging from 2 to 5000, on model performance. This analysis revealed that employing the top 50 HVGs for modeling yielded the the highest F1 score (a harmonic mean of precision and recall), area under the curve (AUC), and a comparatively high prediction accuracy (see Materials and methods, Figure 6—figure supplement 1A). Next, we chose top 50 HVGs as features to build the model. As a control model, we chose 50 randomly chosen genes. Five projection targets models were independently trained by splitting cells into training (70%) and test dataset (30%). Using top 50 HVGs also gave rise to significantly better model performance in regarding to prediction accuracy, AUC and F1 score, compared to using randomly chosen 50 genes (Figure 6B). We also performed 100 iterations randomly sampling 1000 cells and swapping barcoded with non-barcoded labels, which substantially decreased model predictive performance across various evaluation metrics (see Materials and methods, Figure 6—figure supplement 1B). This outcome underscores the critical importance of label accuracy for the predictive capabilities of the model, suggesting the authenticity of current barcoded cells labels despite potential false positives from stringent UMI thresholding. Altogether, these results suggest that the top 50 HVGs are more informative for predicting and decoding projection patterns.

Figure 6. Machine learning-based modeling predicts projection patterns based on gene expression.

(A) Schematics of machine learning modeling steps. (B) Prediction accuracy (left panel), AUC score (middle panel) and F1 score (right panel) of top HVGs and random chosen equal number of genes for modeling building. A total of 100 trials have been performed by randomly sampling 1000 cells from 8210 cells. Top 50 HVGs or 50 randomly chosen genes were used as features per trial. Comparisons were made between models built by the HVGs group and random genes group for each projection target. The displayed p value was computed using a two-sided Wilcoxon test. Data are the mean ± SD. (C, E) SHAP summary plots of DMS and MD showing important features (genes) with feature effects. For each model, non-barcoded cells were encoded to class 0 and barcoded cells were encoded to class 1. Models were built using top 50 HVGs. (D, F) Normalized expression of the most important genes with positive feature effects in Naïve Bayes modeling of DMS (D) or MD (F) and normalized expression of barcode 1 representing DMS-projecting (E) or barcode 2 representing MD-projecting (F) on PC1 and PC2 embeddings. Note that bottom panel of (D, F) is identical to DMS and MD barcode expression in Figure 3—figure supplement 1M. In (D, F), 1853 barcoded neurons (top 10 frequent projection patterns) were represented. In (C, E), For calculating SHAP values, both the training and testing datasets were subsampled to include 1500 cells each.

Figure 6—source data 1. Related to Figure 6B, Figure 6—figure supplement 1B.
Files contain raw data and statistical summary, raw data and for Figure 6—figure supplement 1B.
elife-85419-fig6-data1.xlsx (313.5KB, xlsx)

Figure 6.

Figure 6—figure supplement 1. SHAP summary plots of Naïve Bayes models.

Figure 6—figure supplement 1.

(A) Prediction accuracy (left panel), AUC score (middle panel) and F1 score (right panel) by tuning number of HVGs used for naïve bayes modeling building. A total of 100 trials have been performed by randomly sampling 1000 cells from 8210 cells and calculating top HVGs per trial. (B) Prediction accuracy (left panel), AUC score (middle panel) and F1 score (right panel) of original top 50 HVGs and after swapping barcoded/non-barcoded cell labels for modeling building. A total of 100 trials have been performed by randomly sampling 1000 cells from 8210 cells. In each trial, barcoded/non-barcoded cell labels were swapped. The number of swapped cells depends on the minimum number of barcoded cells or non-barcoded cells. Models were built with the top 50 HVGs using original labels (Original) or labels swapped (Swapping) for comparison for each projection target. The displayed p value was computed using a two-sided Wilcoxon test. Data are the mean ± SD. (C) SHAP summary plots of AI, BLA, and LH showing important features (genes) with feature effects. For each model, non-barcoded cells were encoded to class 0 and barcoded cells were encoded to class 1.

To interpret the important genes contributing to a certain projection pattern, we used a game-theoretic approach to explain the output of HVGs-based Naïve Bayes models (Lundberg et al., 2020; Figure 6A). We used top 50 HVGs to build Naïve Bayes model and summarized effects of HVGs in SHAP (SHapley Additive exPlanations) values for each projection pattern (see Materials and methods; Figure 6C–F, Figure 6—figure supplement 1C). As examples, Nptxr gene was the top positive predictors for DMS projection, suggesting that a cell that expresses high levels of Nptxr has a higher probability of projecting to DMS. Similarly, Rprm was the top positive predictors for MD projection. By examining top effective genes (features) on PC embeddings of the projection matrix, we found that the expression pattern of these positive predictors mostly overlaps with projection barcode distribution (Figure 6D and F). These results mathematically establish the relationship between gene expression and structural connectivity, indicating the predictive power of a specific gene cluster for projection properties of vmPFC neurons.

Discussion

Given the complexity of brain circuits, neuronal subtypes must be characterized from multiple viewpoints (Zeng, 2022). Information including neuronal projection patterns (i.e. region-to-region connectivity), physiological properties, gene expression, and how they encode information in behavioral paradigms, are essential to understand functional brain circuits. Therefore, it is inevitably difficult to acquire a complete picture of brain circuits when only one analytic modality is considered. In this study, we have developed a multiplexed barcoding method that is integrated with scRNA-seq, enabling simultaneous transcriptome and projectome analyses. Retrograde AAVs are injected into multiple target regions simultaneously, thereby labeling projection neurons within the brain region of interest and facilitating their transcriptional analysis. Here, by comparing to other methods, we highlight distinct features of MERGE-seq and key biological insights that MERGE-seq can provide.

Early approaches of barcode-based neuronal projection mapping mainly focus on elucidating the projections of individual neurons in a single brain without providing the transcriptional signatures corresponding to those individual neurons (MAPseq; Kebschull et al., 2016). We therefore developed MERGE-seq to connect single-neuron transcriptome and projectome with high throughput. While there are some conceptual similarities to BARseq or ConnectID (Chen et al., 2019; Klingler et al., 2021), MERGE-seq has its unique features and advantages. BARseq can acquire single-neuron transcriptome and projectome, but with only a number of genes due to limited throughout of in situ sequencing. An improved version of BARseq can allow tens of genes to be detected, but still with a low throughput compared to scRNA-seq and a high cost in regards to synthesizing RNA probes (Sun et al., 2021). ConnectID (scRNA-seq combined with MAPseq) improves the detection of transcriptome using scRNA-seq but has a relatively low recovery rate of cells with transcriptome and projectome simultaneously (~16%, 391 cells with barcode identity in 2450 cells with scRNA-seq; Klingler et al., 2021). In contrast, MERGE-seq enables transcriptional profiling of thousands of genes per neuron, with valid projectome barcode information recovered from approximately 50% of FAC-sorted cells passing stringent determination criteria. Another advantage of MERGE-seq is that users only need to sequence one brain region – the source area. While, in BARseq or ConnectID, users need to perform numerous tissue homogenization and sequencing for downstream brain regions to query target area barcodes information (projection).

MERGE-seq is a retro-AAV-based scRNA-seq approach. Previous research has employed retro-AAV techniques to probe the projection-specific transcriptome or epigenome of individual neurons (Lui et al., 2021; Tasic et al., 2018; Tasic et al., 2016; Yao et al., 2021; Zhang et al., 2021). Yet, these studies have not developed a multiplexed approach for investigating the complex collateral projection patterns of neurons. Another retro-AAV based-approach, VECTORseq, was recently developed to associate neuronal projectome and transcriptome (Cheung et al., 2021). VECTORseq used several viral transgenes including three recombinases (DreO, Cre, Flpo) and two fluorescent proteins (tdTomato and EGFP) to barcode neurons. However, these transgenes are variable in length (DreO, ~1000 bp; Cre, ~1000 bp; FLPo, ~1200 bp; tdTomato, ~1400 bp; EGFP, ~700 bp) and driven by different promoters with different strength (EF1a, hSyn, CAG). Such an approach will inevitably result in differential expression of these different transgenes in labelled neurons, which in turn leads to different rates of transgene recovery in these neurons. In addition, viral-mediated overexpression of these recombinase may lead to toxic to the labeled neurons due to non-specific recombination events (Xiao et al., 2012). Therefore, the transgenes used in VECTORseq method should be carefully selected to avoid any potential interferences with neuronal function or gene expression by these different transgenes. In contrast, MERGE-seq used 15-nucleotide barcode sequences in the 3’UTR region of EGFP as projection index driven by the same promoter to label different projection neurons. The expression of these different barcoded EGFP mRNA is comparable, and the number of these barcoded retro-AAV is unlimited. Therefore, MERGE-seq allows users to examine more populations (theoretically unlimited) in one brain and more extensive analysis of collateralization. Further, MERGE-seq can reveal projectome of single collateral projection neurons and identify molecular features of these neurons (Figure 5). However, the collateral projection patterns of single neurons were not reported in VECTORseq and Retro-seq-based method (Cheung et al., 2021; Lui et al., 2021). For example, Lui et al., 2021. used Retro-seq to investigate the correspondence between transcriptomics and projection patterns of vmPFC neurons, and inferred collateral projection based on the finding that transcriptome-defined neuron subtypes can project to different targets (or neurons projecting to different targets share common transcriptome-defined neuron subtype). However, the population level multi-target projections of a transcriptome-defined neuron subtype do not necessarily reflect collateral projection of individual neurons within a subtype. For instance, individual neurons within a subtype could project to distinct targets (dedicated projection), but their collective projections show multiple targets. In contrast, in MERGE-seq, individual neurons that were retrogradely labeled multiple projection barcodes are determined as collateral projection neurons. By MERGE-seq analysis, we uncovered dedicated and collateral projection patterns of individual vmPFC neurons to the five downstream targets, and revealed molecular features associated with these dedicated or collateral projection neurons (Figures 35). In addition, MERGE-seq strategy can be readily applied to other animal models, which is especially beneficial for research in models (e.g. non-human primate) where genetic manipulation is challenging. In summary, while Retro-seq methods provide valuable population-level insights, they do not capture the complex collateral projections that MERGE-seq can discern at the single-cell level. Our findings build upon and extend those of Lui et al. by demonstrating that individual neurons within transcriptome-defined subtypes exhibit a diverse range of projection patterns. This contributes a new layer of understanding to the intricate architecture of PFC circuits, emphasizing the nuanced interplay between divergence and convergence in neuronal pathways.

Although MERGE-seq does not offer spatial information of neurons currently, it leverages widely accessible droplet-based scRNA-seq, avoiding specialized equipment. Meanwhile, the extensive spatially resolved mouse brain atlases available (Allen et al., 2023; Yao et al., 2023; Zhang et al., 2023) allow for easy spatial annotation of cell populations using DEGs identified by scRNA-seq, as we demonstrated by mapping neuronal subtypes with MERFISH data of PFC. Compared to imaging-based spatial transcriptomics like MERSCOPE with constrained gene numbers, or next-generation sequencing (NGS)-based methods that are lack of true single-cell resolution (e.g. 50 µm 10 x Genomics Visium or 10–50 µm for DBiT-seq-based methods; Deng et al., 2023), we believe our method stands out as a robust solution and offers an advantageous balance between resolution and scope.

There are several potential concerns and limitations of current study. First, a recognized limitation of using retro-AAV-based methods, including MERGE-seq, is the imperfect retrograde labeling efficiency in target regions. Labeling efficiency could be variable depending on the different source brain regions, projection strength, the distance between source and target brain regions and different AAV serotypes or tropism. For example, only nine neurons of the L5-Htr2c subtype were recovered with valid barcodes, which may be attributable to technical factors including cell loss during dissociation or AAV2-retro tropism. Alternatively, this subtype may intrinsically lack projections to the selected target regions examined in this study. Furthermore, single-cell dissociation for scRNA-seq can result in cell loss, thereby reducing the recovery rate of barcoded neurons. All these factors could influence the extent to which the complete range of neuronal projections is captured. Consequently, the quantitative conclusions drawn here might not fully represent the true extent of neuronal projections.

Second, the robust detection of projection barcodes and its recovery rate in neurons labeled with barcoded AAV-retro viruses is indeed a critical and challenging aspect of our methodology. As mentioned above, this challenge is largely due to the differential viral transduction efficiency across neurons, leading to inconsistent barcode expression. Neurons with low barcode expression may fall beneath the detection threshold of conventional sequencing methods. A suboptimal recovery rate can potentially lead to underrepresentation of certain neuron populations or projection patterns in the analyzed data. This in turn could impact the interpretation of neuronal connectivity and function, as projections that are less efficiently labeled or harder to detect might be overlooked. For instance, if a subset of neurons with low barcode expression is systematically missed, it could erroneously suggest that these neurons do not participate in specific projection patterns. Conversely, overrepresentation of certain barcodes due to higher transduction efficiency could falsely indicate a predominance of certain projections. One potential solution to improve barcode detection is to include FAC-sorted EGFP-negative cells as a negative control, which may help to differentiate between true signal and background noise. Enhancements in sequencing technologies, offering increased read lengths and deeper sequencing, could potentially improve barcode detection sensitivity. In parallel, applying single-molecule FISH technologies like MERFISH to spatially resolve barcodes offers a robust and direct detection method. This technology can provide detailed coverage and resolution of individual RNA molecules within single cells, bypassing additional PCR amplification steps and reducing cell loss during physical isolation. Furthermore, carefully controlling the viral titer and refining the procedures of single neuron suspension preparation, as performed in this study, is required to control the labeling efficiency and recovery rate.

DMS is en route from vmPFC to subcortical regions (Shepherd, 2013), thus raising another concern about the transducing ability of AAV2 in axons of passages. However, the retrograde transport of AAV has been effectively demonstrated to target projection neurons at axonal terminals, with injections into the DMS exhibiting labeling patterns and efficiencies that match those of synthetic tracers (Tervo et al., 2016). Further, it has been experimentally verified that AAV2 spread is confined to the vicinity of synaptic terminals and does not affect axon fibers in passages, especially as evidenced by retro-AAV injections in the cervical spinal cord (Wang et al., 2018). While these findings are reassuring, additional research is needed to unequivocally eliminate the possibility of transduction along axon fibers of passage. The five distinct injection sites we chose for our study are spatially disparate, encompassing both cortical and subcortical regions, and span a range from the anterior (Bregma,+2 mm) to the posterior (Bregma, –1.5 mm) brain regions. This separation mitigates the potential overlap in labeling when examining spatially proximate nuclei, such as those in the hypothalamus. Nevertheless, examining such closely situated targets would necessitate meticulous quantification of virus injection volumes to prevent cross-target viral dissemination, ensuring the specificity required for accurate projection mapping.

In summary, we develop MERGE-seq, a powerful multiplexed projectome and transcriptome analysis platform that will help researchers perform big-data research at low cost. This will enable researchers to understand organizing principles and molecular features of neural circuits across modalities, and to construct more comprehensive mesoscale connectomes.

Materials and methods

Key resources table.

Reagent type (species) or resource Designation Source or reference Identifiers Additional information
Antibody Anti-GFP (Rat Monoclonal) Nacalai Cat# 04404–84, RRID:AB_10013361 IHC(1:500)
Antibody Anti-tdTomato (Goat Polyclonal) OriGene Cat#AB8181-200, RRID:AB_2722750 IHC(1:500)
Antibody Hoechst 33342 Lifetech Cat#H3570 IHC(1:1000)
Antibody Donkey anti-rat Alexa Fluor 488 (Donkey Polyclonal) Invitrogen Cat#A21208 IHC(1:800)
Antibody Donkey anti-goat Alexa Fluor 568 (Donkey Polyclonal) Invitrogen Cat#A11057 IHC(1:800)
Recombinant DNA reagent pAAV-CAG-tdTomato (plasmid) Addgene Cat#59462
Recombinant DNA reagent pAAV-CAG-EGFP barcode-0-SV40 polyA (plasmid) This paper Cat#190864 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-1-SV40 polyA (plasmid) This paper Cat#190865 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-2-SV40 polyA (plasmid) This paper Cat#190866 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-3-SV40 polyA (plasmid) This paper Cat#190867 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-4-SV40 polyA (plasmid) This paper Cat#190868 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-5-SV40 polyA (plasmid) This paper Cat#190869 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-6-SV40 polyA (plasmid) This paper Cat#190870 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-7-SV40 polyA (plasmid) This paper Cat#190871 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-8-SV40 polyA (plasmid) This paper Cat#190872 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-9-SV40 polyA (plasmid) This paper Cat#190873 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFP barcode-10-SV40 polyA (plasmid) This paper Cat#190874 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFPnls barcode-206-SV40 polyA (plasmid) This paper Cat#190875 Submitted to Addgene
Recombinant DNA reagent pAAV-CAG-EGFPnls barcode-210-SV40 polyA (plasmid) This paper Cat#190876 Submitted to Addgene
Chemical compound, drug AMPA receptor antagonist CNQX Abcam Cat#ab120017 working concentration:10 µM
Chemical compound, drug NMDA receptor antagonist D-AP5 Abcam Cat#ab120003 working concentration:50 µM
Chemical compound, drug 2-Mercaptoethanol Sigma Cat#M6250 working concentration:0.067 mM
Chemical compound, drug EDTA Invitrogen Cat#15575020 working concentration:1.1 mM
Chemical compound, drug L-Cysteine hydrochloride monohydrate Sigma Cat#C7880 working concentration:5.5 mM
Chemical compound, drug Deoxyribonuclease I Sigma Cat#D4527 working concentration:100 units/ml
Chemical compound, drug Protease Sigma Cat#P5147 working concentration:1 mg/ml
Chemical compound, drug Dispase Worthington Cat#LS02106 working concentration:1 mg/ml
Chemical compound, drug Papain Worthington Cat#LS003126 working concentration:20 units/ml
Commercial assay or kit Debris Removal Solution Miltenyi Cat#130-109-398
Commercial assay or kit Chromium Single Cell 3' Reagent Kits (v3) 10 X Genomics Cat#PN1000075
Commercial assay or kit NEBNext Ultra II Q5 Master Mix NEB Cat#M0544L
Commercial assay or kit SPRIselect Beckman Cat#B23317
Commercial assay or kit Mm-Syt6 ACD Bioscience Cat#449641
Commercial assay or kit Mm-Pou3f1-C2 ACD Bioscience Cat#436421-C2
Sequence-based reagent P5-Read1 This paper PCR primers AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC
Sequence-based reagent P7-index-Read2-EGFP This paper PCR primers CAAGCAGAAGACGGCATACGAGATAGGATTCGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTG gCATGGACGAGCTGTACAAG

AAV vector design

Plasmid pAAV-CAG-tdTomato (Addgene, #59462) was first modified by replacing tdTomato and WPRE with EGFP by T4 DNA Ligase mediated ligation. A 15 bp barcode sequence was then inserted after the stop codon of EGFP, linked by EcoRI restriction enzyme recognition site. Sequences barcode 0 representing the AI target, CTGCACCGACGCATT; barcode 1 (DMS target), GAAGGCACAGACTTT; barcode 2 (MD target), GTTGGCTGCAATCCA; barcode 3 (BLA target), AAGACGCCGTCGCAA; barcode 4 (LH target), TATTCGGAGGACGAC. Other barcode sequences used for IHC include barcode 10, AGCTATGCACGATCA; barcode 206, GCGTAAGTCTCCTTG; barcode 210, CCTGTATGCGTGGAG. Engineered viruses were produced by Gene Editing Core Facility, Center for Excellence in Brain Science and Intelligence Technology.

Virus injection

Male adult C57BL/6 mice (8 weeks of age) were anesthetized intraperitoneally using pentobarbital sodium (10 mg/mL, 120 mg/kg b.w.) and unilaterally injected with rAAV2-retro-EGFP-Barcode virus (barcode 0, 1, 2, 3, 4) into five projection targets simultaneously. Coordinates for these injections are as follows. Reference from Bregma and dura, AI at two locations (in mm: 2.0 AP, 2.52 ML, –2.0 DV; 1.6 AP, 2.97 ML, –2.2 DV) with rAAV2-retro-EGFP-barcode 0 (250 nl and 200 nl, 2.90×1013 VG/ml); DMS at one location (in mm: 0.6 AP, 1.8 ML, –2.2 DV, 8 degree angle), with rAAV2-retro-EGFP-barcode 1 (500 nl, 1.00×1013 VG/ml); MD at one location (in mm: –1.25 AP, 1.35 ML, –3.55 DV, 20 degree angle), with rAAV2-retro-EGFP-barcode 2 (300 nl, 1.27×1013 VG/ml); BLA at one location (in mm: –1.5 AP, 3.2 ML, –4.2 DV), with rAAV2-retro-EGFP-barcode 3 (300 nl, 2.00×1013 VG/ml); LH at one location (in mm: –0.94 AP, 1.2 ML, –4.55 DV), with rAAV2-retro-EGFP-barcode 4 (250 nl, 2.25×1013 VG/ml). Following each injection, the micropipette was left in the tissue for 10 min before being slowly withdrawn to prevent virus spilling and backflow. Mice were sacrificed 6 weeks after virus injection. Single-cell suspensions were generated as described in methods below.

For dual-color retrograde virus tracing, two regions were ipsilaterally injected with virus at the same time, one with rAAV2-retro-EGFP-barcode 10 (2.00×1013 VG/ml) or rAAV2-retro-EGFPnls-barcode 206 or 210 (3.10×1013 VG/ml for barcode 206 and 4.38×1013 VG/ml for barcode 210) and one with rAAV2-retro-tdTomato (2.25×1013 VG/ml). rAAV2-retro-EGFPnls was used to avoid dense fiber staining when performing immunohistochemistry. We deposited the virus plasmid constructs to Addgene (pAAV-CAG-EGFP barcode-(0–10)-SV40 polyA, pAAV-CAG-EGFPnls barcode-(206, 210)-SV40 polyA; Addgene ID 190864–190876).

scRNA-seq sample and library preparation

For mice without FAC-sorting (mouse #1, #2, #3), three mice that had been injected with virus were anaesthetized and then subjected to transcranial perfusion with ice-cold oxygenated self-made dissection buffer (in mM: 92 Choline chloride, 2.5 KCl, 1.2 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 Glucose, 5 Sodium ascorbate, 2 Thiourea, 3 Sodium pyruvate, 10 MgSO4.7H2O, 0.5 CaCl2.2H2O, 12 N-Acetyl-L-Cysteine). The brain was removed, 300 µm vibratome sections were collected, and the PrL and IL regions were microdissected under a stereo microscope with a cooled platform. Brain slices were incubated in dissection buffer with 10 µM AMPA receptor antagonist CNQX (Abcam, ab120017) and 50 µM NMDA receptor antagonist D-AP5 (Abcam, ab120003) at 33 °C for 30 min. The pieces were dissociated first using the ice-cold oxygenated dissection buffer added papain (20 units/ml, Worthington, LS003126), 0.067 mM 2-mercaptoethanol (Sigma, M6250), 1.1 mM EDTA (Invitrogen, 15575020), 5.5 mM L-Cysteine hydrochloride monohydrate (Sigma, C7880) and 100 units/ml Deoxyribonuclease I (Sigma, D4527), with 30–40 min enzymatic digestion at 37 °C, followed by 30 min 1 mg/ml protease (Sigma, P5147) and 1 mg/ml dispase (Worthington, LS02106) enzymatic digestion at 25 °C. Supernatant was removed and digestion was terminated using dissection buffer containing 2% fetal bovine serum (FBS, Bioind, 04-002-1A). Single-cell suspension was generated by manual trituration using fire-polishing Pasteur pipettes and filtered through a 35 µm DM-equilibrated cell strainer (Falcon, 352052). Cells were then pelleted at 400 × g for 5 min. The supernatant was carefully removed and resuspended in 1–2 ml dissection buffer containing 2% FBS. The suspension was then subjected to the debris removal step using the Debris Removal Solution (Miltenyi, 130-109-398). Cell pellets were resuspended and 48,000 cells were loaded into 3 lanes to perform 10x Genomics sequencing. For mice with FAC-sorting (mouse #4, #5, #6), PrL and IL regions were microdissected and dissociated as mice without FAC-sorting, cells were sorted to enrich for EGFP-positive rAAV2-retro-EGFP-barcodes labeled cells. About 4893 EGFP-positive cells were captured and loaded to perform 10x Genomics sequencing. Chromium Single Cell 3' Reagent Kits (v3) were used for library preparation (10x Genomics). Libraries were sequenced on an Illumina Novaseq 6000 system.

Projection barcode library preparation

Parallel PCR reactions were performed containing 50 ng of post cDNA amplification reaction cleanup material as a template. P5-Read1 (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTC) and P7-index-Read2-EGFP (CAAGCAGAAGACGGCATACGAGATAGGATTCGGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGgCATGGACGAGCTGTACAAG) (200 nM each) were used as primers with the NEBNext Ultra II Q5 Master Mix (NEB, M0544L). Amplification was performed using the following PCR protocol: (1) 33 °C for 1 min, (2) 98° for 10 s, then 65 °C for 75 s (20–24 cycles), (3) 75 °C for 5 min. Reactions were re-pooled during 1 X SPRI selection (Beckman, B23317), which harvested virus projection barcodes library. 431–437 bp (with 120 bp adaptors) libraries were sequenced using Illumina HiSeq X Ten.

Immunohistochemistry

Mice were sacrificed 6 weeks after virus injection. Mice were transcardially perfused with phosphate-buffered saline (PBS) followed by 4% paraformaldehyde (PFA). Brain samples were extracted and cryoprotected in 20% sucrose/4% PFA, immersed sequentially in 20% sucrose (in 4% PFA) and 30% sucrose (in 0.1 M phosphate buffer, PB) until sunk, and then transferred to 30% sucrose/PB for more than 24 h. Brain samples were flash-frozen on dry ice and sectioned at 30 μm on a cryostat (Leica, SM2010R). For dual-color retrograde virus tracing, brain slices were blocked in 10% donkey serum and 0.3% Triton X-100 at 37 °C for 1 hr. Slices were then incubated with primary antibodies against green fluorescent protein (GFP, 1:500, Nacalai, 04404–84, RRID: AB_10013361) and tdTomato (1:500, OriGene, AB8181-200, RRID: AB_2722750) at room temperature for 2 hr, then 4 °C overnight. Slices were washed three times using PBS and incubated with Hoechst 33342 (1:1000, Lifetech, H3570), as well as secondary donkey anti-rat Alexa Fluor 488 antibodies (1:800, Invitrogen, A21208) and donkey anti-goat Alexa Fluor 568 antibodies (1:800, Invitrogen, A11057) at room temperature for 1 hr. Slices were washed three times using PBS and coverslipped. Stained slices were imaged with a 4 X objective with numerical aperture 0.16 as a map, followed by 1.5 µm increment z stacks with a 10 X objective with numerical aperture 0.4 (FV3000, OLYMPUS). Composite images were automatically stitched in the X-Y plane using ImageJ/FIJI. RNA FISH experiments were performed using RNA-Scope reagents and protocols (ACD Bioscience, CA), following instructions for fixed-frozen tissue. For experiments using RNA-Scope, immunohistochemistry was performed following RNA-Scope. Probes of RNA-Scope used in this study include, Mm-Syt6 (449641), Mm-Pou3f1-C2 (436421-C2).

scRNA-seq data pre-processing

scRNA-seq data were aligned with the customized mouse reference genome mm10-3.0.0 adding five projection barcodes as separate genes. Further projection barcode expression was obtained as described in (Projection barcode library preparation and Projection barcode FASTQ alignment). scRNA-seq data were demultiplexed using the default parameters of Cellranger software (10x Genomics, v3.0.2). Obtained filtered transcription count matrix was used for downstream analysis. For unsorted samples, we used three mice with three GEM wells in one Chromium Single Cell 3' Chip (v3). Among unsorted samples, sample mouse #1 recovered 8040 cells, 447,984,945 read pairs were aligned, mean reads per cell is 55,719, median genes per cell is 2382; sample mouse #2 recovered 7443 cells, 399,187,134 read pairs were aligned, mean reads per cell is 53,632, median genes per cell is 2379; sample mouse #3 recovered 7243 cells, 410,627,696 read pairs were aligned, mean reads per cell is 56,693, median genes per cell is 2385. For FAC-sorted samples, we used three mice with one GEM well in one Chromium Single Cell 3' Chip (v3). FAC-sorted sample recovered 2075 cells, 410,434,792 read pairs were aligned, mean reads per cell is 197,799, median genes per cell is 6533.

Projection barcode FASTQ alignment

Demultiplexing of projection index barcode was performed using deMULTIplex R package (v1.0.2) (https://github.com/chris-mcginnis-ucsf/MULTI-seq, copy archived at mcGinnis, 2023) with modifications. Briefly, we have revised the MULTIseq.align function to count the UMI of each projection barcode separately. We adopted a minimal Hamming distance of 2 for the MULTIseq.align function to improve the matching accuracy between detected and designed barcodes. Tag parameters in ‘MULTIseq.preProcess’ function were adjusted according to our user-defined position of index barcode length and position. Based on our primer design, the expected format is: cell barcode in Read 1 (bases 1–16), UMI in Read 1 (bases 17–28), and projection barcode in Read 2 (bases 31–45).

scRNA-seq transcriptional expression analysis

The filtered count matrix was analyzed and processed using Seurat and Scanpy, including data filtering, normalization, highly variable genes selection, scaling, dimension reduction, and clustering (Stuart et al., 2019; Wolf et al., 2018). First, scRNA-seq data from three samples of unsorted cells and one sample of sorted EGFP-positive cells were created as Seurat object separately; genes with less than three counts were removed and cells with fewer than 200 genes detected were removed. Second, four Seurat objects were merged using the ‘merge’ function in Seurat. Downstream analysis of merged Seurat objects were as follows: (1) Data filtering: cells with a mitochondrial gene ratio of greater than 20% were excluded. We kept cells for which we detected between 500 and 8000 genes (cells with more than 8000 genes detected were considered potential doublets), and between 1000 and 60,000 counts (cells with more than 60,000 counts detected were considered potential doublets). (2) Data normalization: for each cell, counts were log normalized with the ‘NormalizeData’ function in Seurat; ‘scale.factor’ was set to 50,000. (3) Highly variable gene selection: 2000 highly variable genes were calculated using the ‘FindVariableFeatures’ function in Seurat. (4) Data scaling: the Seurat object was performed using the ‘ScaleData’ function with default parameters. The number of counts, number of genes, mitochondrial gene ratio, and sorting condition were regressed out in ‘ScaleData’. (5) Principal component analysis: highly variable genes were used to calculate principal components in the ‘RunPCA’ function. A total of 100 principal components (PCs) were obtained and stored in Seurat object for computing neighborhood graphs and uniform manifold approximation and projection (umap) in following section. (6) Leiden clustering: Seurat object was converted into loom file and imported by Scanpy. A neighborhood graph of observations was computed by ‘scanpy.pp.neighbors’ function in Scanpy. Then, leiden algorithm was used to cluster cells by ‘scanpy.tl.leiden’ function in Scanpy. (7) Cluster merge and trimming: The top 200 DEGs for each cluster were calculated using the ‘scanpy.tl.rank_genes_groups’ function in Scanpy using parameters method=‘wilcoxon’ and n_genes = 200. Cluster annotation was performed manually based on previously reported markers of PFC all cell types, layer, neuron subtypes, and mouse brain atlas (Bhattacherjee et al., 2019; Sorensen et al., 2015). Cell clusters with similar marker genes were merged into one cluster. Complete marker lists for all cell types and all excitatory neuron subtypes calculated using ‘FindAllMarkers’ function in Seurat were provided (see Supplementary files 2 and 3).

Two rounds of clustering were performed. In the first round, we clustered all cells detected by scRNA-seq to generate major cell type classification, that is excitatory neurons, inhibitory neurons, astrocytes, oligodendrocytes, endothelial cells, and microglia. Then we use the annotated ‘Excitatory neuron’ cluster to further cluster excitatory neuronal subtypes. In the 2nd round clustering, we found several clusters expressed a lower number of counts per cell, a lower number of genes per cell, a higher percentage of mitochondria genes, and ribosome protein genes as DEGs, which indicates cell clusters with low cell quality (Ilicic et al., 2016). We also found several other clusters with a small number of cells expressing typical markers of non-neuron cells, such as microglia (C1qa, C1qb) oligodendrocytes (Olig1, Olig2) and endothelial cells (Flt1, Cldn5), which indicated ‘contamination’ of other cell types mixed in ‘Excitatory neuron’ in the initial clustering results. We then filtered out those cells from ‘Excitatory neuron’ cluster and redid clustering to generate excitatory neuronal subtypes (see Supplementary file 1).

Cell type correspondence assessment

To evaluate whether the transcriptional cell types we recovered and annotated correlated with cell types from spatial transcriptomics of PFC or other scRNA-seq datasets of PFC, we used a previously reported comparison analysis method (Bhattacherjee et al., 2023). Briefly, we integrated our dataset and previously reported datasetes (Bhattacherjee et al., 2023; Bhattacherjee et al., 2019; Lui et al., 2021; Yao et al., 2021) into a harmonized PCA space using the Harmony algorithm (Korsunsky et al., 2019). We then constructed a K-nearest neighbor (KNN) graph incorporating all cells from the two datasets. We used the first 30 harmonized principal components as inputs for FindNeighbors function of Seurat to calculate the KNN. For each cluster of public dataset, we found its 30 nearest neighbor cells and determined the percentages of those cells belonging to each scRNA-seq cluster of our dataset. This created a correspondence matrix showing the transcriptional similarity of each public dataset cluster to each cluster of our dataset.

In this matrix, rows represent our scRNA-seq clusters, columns represent public dataset clusters, and the matrix values reflect the degree of similarity between the clusters. This process was reciprocally conducted for clusters of our dataset, comparing them to public dataset clusters to form a secondary correspondence matrix. The mean of these two matrices provided a quantifiable measure of the similarity between cell clusters identified by our annotation and public dataset annotation.

Binary projection pattern classification

To determine valid barcoded cells, we first calculated the 95th percentile of the total number of unique molecular identifiers (nUMI) that were mapped with five barcodes, and removed the unusually high numbers of UMIs, which might indicate doublets or PCR-biased amplification. Next, we used two set of cells as negative control, that is, cells supposed not to contain projection barcodes. First set of negative control cells we used is non-neuronal cells classified by coarse clustering based on single-cell transcriptome (Tervo et al., 2016). Second set of negative control cells we used is ‘EGFP-negative’ cells in FAC-sorted dataset. Basically, we calculated the total five projection barcodes counts determined by cellranger of FAC-sorted dataset, then we assigned the cells with zero projection barcodes (nUMI of EGFP RNA = 0) counts as ‘EGFP-negative’ cells. For two set of negative control cells, we searched for the value in the empirical cumulative distribution function (ECDF) that is closest to the 99.9th percentile agains each projection barcode, respectively. We selected the higher UMI threshold from the two given sets of threshold values. A cell is determined to be validly barcoded if the number of the barcode UMIs within the cell is larger than the threshold. For example, the calculated threshold of UMIs for barcode 0 (AI) is 28, which means if a cell contains more than 28 UMIs of barcode 0, then this cell is validly barcoded by AI. UMIs threshold for DMS, 101; for MD, 114; for BLA, 35; for LH, 103. Finally, we dropped UMI counts of determined non-barcoded cells to zero to obtain the index barcode counts matrix used for downstream analysis. Binary projection patterns were calculated by five projection targets set intersections of corresponding barcoded cells. Only the top 10 frequent binary and collateral projection patterns were kept for reliable inference.

Projection pattern-specific DEGs analysis

DEGs were calculated using the default parameters of the ‘FindMarkers’ function in Seurat, except the MAST algorithm was used to do DE testing. For the DEG volcano plot, the chosen cut-off for statistical significance was 10–10 (Figure 4 and Figure 4—figure supplement 1) or 10–5 (Figure 5 and Figure 5—figure supplement 1) and chosen cut-off for absolute log2 fold-change was 0.5. Volcano plots were implemented using the EnhancedVolcano R package (v1.4.0).

For the DEG heatmap in Figure 5A, the top 10 DEGs ordered by average log2 fold-change were chosen from each binary cluster. The heatmap was implemented using the ‘scanpy.pl.heatmap’ function in Scanpy.

Joint analysis of MERGE-seq and fMOST projection patterns

Single-neuron projectome data for five PFC target regions (AI, dorsal striatum, BLA, MD, LH) were extracted from Gao et al., 2022. Projection patterns were quantified by calculating the percentage of each pattern relative to total patterns. Patterns were categorized by number of targets (1, 2, 3, or ≥3 targets). MERGE-seq and fMOST projection pattern percentages were statistically compared within each category using two-sided Wilcoxon tests with Holm correction for multiple comparisons.

Machine learning implementation on projection and transcription data

Naïve Bayes was applied to perform a machine learning classification task. We first encoded binary projection labels for each projection target (barcoded and non-barcoded) and five set of models (AI, DMS, BLA, LH and MD) were independently trained. We explored a parameter range of number of the top highly variable genes (HVGs) (2, 5, 10, 20, 50, 100, 200, 300, 400, 500, 1000, 2000, 5000) to fit the model. A total of 1000 cells were randomly sampled from 8210 excitatory neurons and top HVGs were selected by default order of results based on ‘FindVariableFeatures’ function of Seurat per trial. In total, 100 trials were repeated.

To interpret contribution of important genes for each HVGs-based Naïve Bayes model, data matrix for modeling building was constructed as below: for each projection target, 8210 excitatory neurons × (normalized expression of the top 50 HVGs + binary projection labels), or 8210 excitatory neurons× (normalized expression of 50 random genes +binary projection labels). Each data matrix was shuffled first and split by training-testing data in a ratio of 0.7. Machine learning workflow was implanted in pycaret python package (v2.3.4) ‘pycaret.classification’ module. First, for each model, we used ‘setup’ function to initialize the training environment and created the transformation pipeline by setting ‘target’ parameter to column name of input data matrix corresponding to binary projection labels. Then we used ‘create_model’ function to train and evaluate the performance of a given model by setting ‘estimator’ parameter to ‘nb’ and other parameters by default.

To validate barcode/non-barcode label integrity, we performed 100 iterations of random sampling 1000 cells and swapping barcoded with non-barcoded labels. Prediction accuracy, AUC, and F1 scores were compared between original models using the top 50 HVGs with true labels versus models with swapped labels. For each of the 100 trials, 1000 cells were randomly sampled from the 8210 total cells, and barcoded/non-barcoded labels were swapped to the extent possible based on the smaller group. Models were built for each target using original or swapped labels and the top 50 HVGs.

We implemented kernel explainer of SHAP python package (v0.40.1) to summarize the effects of genes. SHAP explainer was created using ‘shap.KernelExplainer(model.predict, training data)’ function. SHAP values were calculated using ‘explainer.shap_values(testing data)’ function, and plotted by ‘shap.summary_plot()’ function to create a SHAP beeswarm plot by displaying top 20 features. Training data and testing data for calculating SHAP values were subsampled with 1500 cells.

Statistical analysis

No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment. Two-sided Wilcoxon test with Holm correction for multiple comparisons was performed in Figure 3D, Figure 6B, and Figure 6—figure supplement 1B. Detailed summary statistics were provided in corresponding Source data files.

Acknowledgements

We thank Dr. Liye Zhang, Pin Wu and Hengxin Liu for generous advice on the bioinformatic analyses.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Peibo Xu, Email: michaelxupb@gmail.com.

Chengyu T Li, Email: tonylicy@lglab.ac.cn.

Zhen-Ge Luo, Email: luozhg@shanghaitech.edu.cn.

Yuejun Chen, Email: yuejunchen@ion.ac.cn.

Jeremy J Day, University of Alabama at Birmingham, United States.

Kate M Wassum, University of California, Los Angeles, United States.

Funding Information

This paper was supported by the following grants:

  • National Natural Science Foundation of China 92368204 to Yuejun Chen.

  • STI2030-Major Projects 2021ZD0200900 to Yuejun Chen.

  • Shanghai Science and Technology Development Funds 23JS1401400 to Yuejun Chen.

  • Strategic Priority Research Program of the Chinese Academy of Sciences XDB32030200 to Yuejun Chen.

  • Shanghai Municipal Science and Technology Major Project 2018SHZDZX05 to Yuejun Chen.

  • National Natural Science Foundation of China 32170806 to Yuejun Chen.

  • National Natural Science Foundation of China 32130035 to Zhen-Ge Luo.

  • Central Guidance on Local Science and Technology Development Fund YDZX20233100001002 to Zhen-Ge Luo.

  • National Key Research and Development Program of China 2021ZD0202500 to Zhen-Ge Luo.

  • National Key Research and Development Program of China 2019YFA0709504 to Chengyu T Li.

  • the Innovations of Science and Technology 2030 from the Ministry of Science and Technology of China 2021ZD0203601 to Chengyu T Li.

  • National Natural Science Foundation of China 31827803 to Chengyu T Li.

  • the Shanghai Municipal Science and Technology Major Project 2021SHZDZX to Chengyu T Li.

  • National Natural Science Foundation of China 92168107 to Zhen-Ge Luo.

  • National Natural Science Foundation of China 32161133024 to Chengyu T Li.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing.

Conceptualization, Resources, Data curation, Investigation, Methodology.

Data curation, Investigation, Visualization.

Resources, Data curation, Investigation.

Data curation, Validation, Visualization.

Data curation.

Resources.

Software, Investigation.

Software, Validation, Investigation.

Investigation.

Investigation.

Software, Investigation, Visualization.

Resources, Supervision, Funding acquisition, Project administration, Writing – review and editing.

Conceptualization, Resources, Supervision, Funding acquisition, Methodology, Project administration, Writing – review and editing.

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Ethics

All animal experiments were conducted according to a protocol approved by the IACUC at the Institute of Neuroscience, CAS Center for Excellence in Brain Science and Intelligence Technology of the Chinese Academy of Sciences (Shanghai, China). (reference number for approval: NA-034-2022).

Additional files

Supplementary file 1. Marker genes of 2nd round clustering of “Excitatory neuron” cluster, related to Figure 2.

The inserted umap shows the number of UMI counts (nCount_RNA) per cluster, number of genes (nFeature_RNA) per cluster, percentage of mitochondria genes (percent_mt) per cluster.

elife-85419-supp1.xlsx (426.6KB, xlsx)
Supplementary file 2. Marker genes of 7 scRNA-seq clusters from all excitatory neurons, related to Figure 2.

Table with marker genes for each cluster calculated using Seurat package using Wilcoxon test.

elife-85419-supp2.csv (962KB, csv)
Supplementary file 3. Marker genes of 8 clusters from all cells, related to Figure 1.

Table with marker genes for each cluster calculated using Seurat package using Wilcoxon test.

elife-85419-supp3.csv (677.6KB, csv)
Supplementary file 4. Median gene detection metrics for different major cell types, related to Figure 1.
elife-85419-supp4.csv (225B, csv)
MDAR checklist

Data availability

Raw gene expression, barcode count matrices and metadata are available from the Gene Expression Omnibus (GSE210174). The computational code used in the study is available at GitHub (https://github.com/MichaelPeibo/MERGE-seq-analysis copy archived at Peibo, 2024). The data needed to evaluate the conclusions in the paper can be downloaded at https://figshare.com/projects/High-throughput_mapping_of_single-neuron_projection_and_molecular_features_by_retrograde_barcoded_labeling/150207. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials and source data files.

The following datasets were generated:

Xu P, Peng J, Yuan T, Chen Z, Wu Z, Luo ZG, Chen Y, Li CT. 2024. High-throughput mapping of single-neuron projection and molecular features by retrograde barcoded labeling. NCBI Gene Expression Omnibus. GSE210174

Peibo X. 2022. figure1&S1. figshare.

Peibo X. 2022. figure5&S5. figshare.

Peibo X. 2022. figure4&S4. figshare.

Peibo X. 2022. figure6&S6. figshare.

Peibo X. 2022. figure2&S2. figshare.

Peibo X. 2022. figure3&S3. figshare.

Peibo X. 2022. Fig4_CDEFGH_confocal. figshare.

Peibo X. 2022. Fig4E&H_Fig5F_FigS4D_IHC_quant_data. figshare.

The following previously published datasets were used:

Bhattacherjee A, Djekidel MN, Chen R, Chen W, Tuesta LM, Zhang Y. 2019. Cell type-specific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. NCBI Gene Expression Omnibus. GSE124952

Lui JH, Luo L. 2020. Single cell RNAseq of Rbp4cre+ neurons from prefrontal cortex. NCBI Gene Expression Omnibus. GSE161936

References

  1. Allen WE, Blosser TR, Sullivan ZA, Dulac C, Zhuang X. Molecular and spatial signatures of mouse brain aging at single-cell resolution. Cell. 2023;186:194–208. doi: 10.1016/j.cell.2022.12.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bhattacherjee A, Djekidel MN, Chen R, Chen W, Tuesta LM, Zhang Y. Cell type-specific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. Nature Communications. 2019;10:4169. doi: 10.1038/s41467-019-12054-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Bhattacherjee A, Zhang C, Watson BR, Djekidel MN, Moffitt JR, Zhang Y. Spatial transcriptomics reveals the distinct organization of mouse prefrontal cortex and neuronal subtypes regulating chronic pain. Nature Neuroscience. 2023;26:1880–1893. doi: 10.1038/s41593-023-01455-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen X, Sun YC, Zhan H, Kebschull JM, Fischer S, Matho K, Huang ZJ, Gillis J, Zador AM. High-throughput mapping of long-range neuronal projection using in situ sequencing. Cell. 2019;179:772–786. doi: 10.1016/j.cell.2019.09.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Chen X, Fischer S, Rue MC, Zhang A, Mukherjee D, Kanold PO, Gillis J, Zador AM. Whole-Cortex in situ sequencing reveals peripheral input-dependent cell type-defined area identity. bioRxiv. 2023 doi: 10.1101/2022.11.06.515380. [DOI] [PMC free article] [PubMed]
  6. Cheung V, Chung P, Bjorni M, Shvareva VA, Lopez YC, Feinberg EH. Virally encoded connectivity transgenic overlay RNA sequencing (VECTORseq) defines projection neurons involved in sensorimotor integration. Cell Reports. 2021;37:110131. doi: 10.1016/j.celrep.2021.110131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cornwall J, Phillipson OT. Mediodorsal and reticular thalamic nuclei receive collateral axons from prefrontal cortex and laterodorsal tegmental nucleus in the rat. Neuroscience Letters. 1988;88:121–126. doi: 10.1016/0304-3940(88)90111-5. [DOI] [PubMed] [Google Scholar]
  8. Cowan WM. The emergence of modern neuroanatomy and developmental neurobiology. Neuron. 1998;20:413–426. doi: 10.1016/s0896-6273(00)80985-x. [DOI] [PubMed] [Google Scholar]
  9. Deng Y, Bai Z, Fan R. Microtechnologies for single-cell and spatial multi-omics. Nature Reviews Bioengineering. 2023;1:769–784. doi: 10.1038/s44222-023-00084-y. [DOI] [Google Scholar]
  10. Gabbott PLA, Warner TA, Jays PRL, Salway P, Busby SJ. Prefrontal cortex in the rat: projections to subcortical autonomic, motor, and limbic centers. The Journal of Comparative Neurology. 2005;492:145–177. doi: 10.1002/cne.20738. [DOI] [PubMed] [Google Scholar]
  11. Gagnon D, Parent M. Distribution of VGLUT3 in highly collateralized axons from the rat dorsal raphe nucleus as revealed by single-neuron reconstructions. PLOS ONE. 2014;9:e87709. doi: 10.1371/journal.pone.0087709. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gao L, Liu S, Gou L, Hu Y, Liu Y, Deng L, Ma D, Wang H, Yang Q, Chen Z, Liu D, Qiu S, Wang X, Wang D, Wang X, Ren B, Liu Q, Chen T, Shi X, Yao H, Xu C, Li CT, Sun Y, Li A, Luo Q, Gong H, Xu N, Yan J. Single-neuron projectome of mouse prefrontal cortex. Nature Neuroscience. 2022;25:515–529. doi: 10.1038/s41593-022-01041-5. [DOI] [PubMed] [Google Scholar]
  13. Ghosh S, Larson SD, Hefzi H, Marnoy Z, Cutforth T, Dokka K, Baldwin KK. Sensory maps in the olfactory cortex defined by long-range viral tracing of single neurons. Nature. 2011;472:217–220. doi: 10.1038/nature09945. [DOI] [PubMed] [Google Scholar]
  14. Gong H, Xu D, Yuan J, Li X, Guo C, Peng J, Li Y, Schwarz LA, Li A, Hu B, Xiong B, Sun Q, Zhang Y, Liu J, Zhong Q, Xu T, Zeng S, Luo Q. High-throughput dual-colour precision imaging for brain-wide connectome with cytoarchitectonic landmarks at the cellular level. Nature Communications. 2016;7:12142. doi: 10.1038/ncomms12142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Huang L, Kebschull JM, Fürth D, Musall S, Kaufman MT, Churchland AK, Zador AM. BRICseq bridges brain-wide interregional connectivity to neural activity and gene expression in single animals. Cell. 2020;182:177–188. doi: 10.1016/j.cell.2020.05.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Hunnicutt BJ, Jongbloets BC, Birdsong WT, Gertz KJ, Zhong H, Mao T. A comprehensive excitatory input map of the striatum reveals novel functional organization. eLife. 2016;5:e19103. doi: 10.7554/eLife.19103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hurley KM, Herbert H, Moga MM, Saper CB. Efferent projections of the infralimbic cortex of the rat. The Journal of Comparative Neurology. 1991;308:249–276. doi: 10.1002/cne.903080210. [DOI] [PubMed] [Google Scholar]
  18. Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, McCarthy DJ, Marioni JC, Teichmann SA. Classification of low quality cells from single-cell RNA-seq data. Genome Biology. 2016;17:29. doi: 10.1186/s13059-016-0888-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kebschull JM, Garcia da Silva P, Reid AP, Peikon ID, Albeanu DF, Zador AM. High-throughput mapping of single-neuron projections by sequencing of barcoded RNA. Neuron. 2016;91:975–987. doi: 10.1016/j.neuron.2016.07.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Kim CK, Ye L, Jennings JH, Pichamoorthy N, Tang DD, Yoo ACW, Ramakrishnan C, Deisseroth K. Molecular and circuit-dynamical identification of top-down neural mechanisms for restraint of reward seeking. Cell. 2017;170:1013–1027. doi: 10.1016/j.cell.2017.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Klingler E, Tomasello U, Prados J, Kebschull JM, Contestabile A, Galiñanes GL, Fièvre S, Santinha A, Platt R, Huber D, Dayer A, Bellone C, Jabaudon D. Temporal controls over inter-areal cortical projection neuron fate diversity. Nature. 2021;599:453–457. doi: 10.1038/s41586-021-04048-3. [DOI] [PubMed] [Google Scholar]
  22. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P-R, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods. 2019;16:1289–1296. doi: 10.1038/s41592-019-0619-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Le Merre P, Ährlund-Richter S, Carlén M. The mouse prefrontal cortex: Unity in diversity. Neuron. 2021;109:1925–1944. doi: 10.1016/j.neuron.2021.03.035. [DOI] [PubMed] [Google Scholar]
  24. Lui JH, Nguyen ND, Grutzner SM, Darmanis S, Peixoto D, Wagner MJ, Allen WE, Kebschull JM, Richman EB, Ren J, Newsome WT, Quake SR, Luo L. Differential encoding in prefrontal cortex projection neuron classes across cognitive tasks. Cell. 2021;184:489–506. doi: 10.1016/j.cell.2020.11.046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee SI. From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence. 2020;2:56–67. doi: 10.1038/s42256-019-0138-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. mcGinnis C. MULTI-Seq. swh:1:rev:ef37c449d1a660e9e638eeffbdfd09ef21dd3d15Software Heritage. 2023 https://archive.softwareheritage.org/swh:1:dir:bb532dea570a513ddf0aabce256d34cb1ee548e0;origin=https://github.com/chris-mcginnis-ucsf/MULTI-seq;visit=swh:1:snp:6283a1e6b2ab6a02f20f8581b1e4dae9d13f31cb;anchor=swh:1:rev:ef37c449d1a660e9e638eeffbdfd09ef21dd3d15
  27. Nassi JJ, Cepko CL, Born RT, Beier KT. Neuroanatomy goes viral! Frontiers in Neuroanatomy. 2015;9:80. doi: 10.3389/fnana.2015.00080. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Oh SW, Harris JA, Ng L, Winslow B, Cain N, Mihalas S, Wang Q, Lau C, Kuan L, Henry AM, Mortrud MT, Ouellette B, Nguyen TN, Sorensen SA, Slaughterbeck CR, Wakeman W, Li Y, Feng D, Ho A, Nicholas E, Hirokawa KE, Bohn P, Joines KM, Peng H, Hawrylycz MJ, Phillips JW, Hohmann JG, Wohnoutka P, Gerfen CR, Koch C, Bernard A, Dang C, Jones AR, Zeng H. A mesoscale connectome of the mouse brain. Nature. 2014;508:207–214. doi: 10.1038/nature13186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Peibo X. MERGE-Seq_Revised. swh:1:rev:4f5553161cb3a740b291c59dad4b83790cfc6663Software Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:c51fd678ce6a046273345d166a3cce5ac5cda243;origin=https://github.com/MichaelPeibo/MERGE-seq-analysis;visit=swh:1:snp:4e1d3121d88d7d0598b92d74dfab5c64286c3b6b;anchor=swh:1:rev:4f5553161cb3a740b291c59dad4b83790cfc6663
  30. Reppucci CJ, Petrovich GD. Organization of connections between the amygdala, medial prefrontal cortex, and lateral hypothalamus: a single and double retrograde tracing study in rats. Brain Structure & Function. 2016;221:2937–2962. doi: 10.1007/s00429-015-1081-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Rockland KS. Corticothalamic axon morphologies and network architecture. The European Journal of Neuroscience. 2019;49:969–977. doi: 10.1111/ejn.13910. [DOI] [PubMed] [Google Scholar]
  32. Rompani SB, Müllner FE, Wanner A, Zhang C, Roth CN, Yonehara K, Roska B. Different modes of visual integration in the lateral geniculate nucleus revealed by single-cell-initiated transsynaptic tracing. Neuron. 2017;93:767–776. doi: 10.1016/j.neuron.2017.01.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Schwarz LA, Miyamichi K, Gao XJ, Beier KT, Weissbourd B, DeLoach KE, Ren J, Ibanes S, Malenka RC, Kremer EJ, Luo L. Viral-genetic tracing of the input-output organization of a central noradrenaline circuit. Nature. 2015;524:88–92. doi: 10.1038/nature14600. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Shepherd GMG. Corticostriatal connectivity and its role in disease. Nature Reviews. Neuroscience. 2013;14:278–291. doi: 10.1038/nrn3469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Sorensen SA, Bernard A, Menon V, Royall JJ, Glattfelder KJ, Desta T, Hirokawa K, Mortrud M, Miller JA, Zeng H, Hohmann JG, Jones AR, Lein ES. Correlated gene expression and target specificity demonstrate excitatory projection neuron diversity. Cerebral Cortex. 2015;25:433–449. doi: 10.1093/cercor/bht243. [DOI] [PubMed] [Google Scholar]
  36. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive integration of single-cell data. Cell. 2019;177:1888–1902. doi: 10.1016/j.cell.2019.05.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Sun YC, Chen X, Fischer S, Lu S, Zhan H, Gillis J, Zador AM. Integrating barcoded neuroanatomy with spatial transcriptional profiling enables identification of gene correlates of projections. Nature Neuroscience. 2021;24:873–885. doi: 10.1038/s41593-021-00842-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T, Bertagnolli D, Goldy J, Shapovalova N, Parry S, Lee C, Smith K, Bernard A, Madisen L, Sunkin SM, Hawrylycz M, Koch C, Zeng H. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nature Neuroscience. 2016;19:335–346. doi: 10.1038/nn.4216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Tasic B, Yao Z, Graybuck LT, Smith KA, Nguyen TN, Bertagnolli D, Goldy J, Garren E, Economo MN, Viswanathan S, Penn O, Bakken T, Menon V, Miller J, Fong O, Hirokawa KE, Lathia K, Rimorin C, Tieu M, Larsen R, Casper T, Barkan E, Kroll M, Parry S, Shapovalova NV, Hirschstein D, Pendergraft J, Sullivan HA, Kim TK, Szafer A, Dee N, Groblewski P, Wickersham I, Cetin A, Harris JA, Levi BP, Sunkin SM, Madisen L, Daigle TL, Looger L, Bernard A, Phillips J, Lein E, Hawrylycz M, Svoboda K, Jones AR, Koch C, Zeng H. Shared and distinct transcriptomic cell types across neocortical areas. Nature. 2018;563:72–78. doi: 10.1038/s41586-018-0654-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Tervo DGR, Hwang BY, Viswanathan S, Gaj T, Lavzin M, Ritola KD, Lindo S, Michael S, Kuleshova E, Ojala D, Huang CC, Gerfen CR, Schiller J, Dudman JT, Hantman AW, Looger LL, Schaffer DV, Karpova AY. A Designer AAV variant permits efficient retrograde access to projection neurons. Neuron. 2016;92:372–382. doi: 10.1016/j.neuron.2016.09.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Vertes RP. Differential projections of the infralimbic and prelimbic cortex in the rat. Synapse. 2004;51:32–58. doi: 10.1002/syn.10279. [DOI] [PubMed] [Google Scholar]
  42. Wang Z, Maunze B, Wang Y, Tsoulfas P, Blackmore MG. Global connectivity and function of descending spinal input revealed by 3d microscopy and retrograde transduction. The Journal of Neuroscience. 2018;38:10566–10581. doi: 10.1523/JNEUROSCI.1196-18.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Wu M, Minkowicz S, Dumrongprechachan V, Hamilton P, Xiao L, Kozorovitskiy Y. Attenuated dopamine signaling after aversive learning is restored by ketamine to rescue escape actions. eLife. 2021;10:e64041. doi: 10.7554/eLife.64041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Xiao Y, Karnati S, Qian G, Nenicu A, Fan W, Tchatalbachev S, Höland A, Hossain H, Guillou F, Lüers GH, Baumgart-Vogt E. Cre-mediated stress affects sirtuin expression levels, peroxisome biogenesis and metabolism, antioxidant and proinflammatory signaling pathways. PLOS ONE. 2012;7:e41097. doi: 10.1371/journal.pone.0041097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Yao Z, van Velthoven CTJ, Nguyen TN, Goldy J, Sedeno-Cortes AE, Baftizadeh F, Bertagnolli D, Casper T, Chiang M, Crichton K, Ding S-L, Fong O, Garren E, Glandon A, Gouwens NW, Gray J, Graybuck LT, Hawrylycz MJ, Hirschstein D, Kroll M, Lathia K, Lee C, Levi B, McMillen D, Mok S, Pham T, Ren Q, Rimorin C, Shapovalova N, Sulc J, Sunkin SM, Tieu M, Torkelson A, Tung H, Ward K, Dee N, Smith KA, Tasic B, Zeng H. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell. 2021;184:3222–3241. doi: 10.1016/j.cell.2021.04.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Yao Z, van Velthoven CTJ, Kunst M, Zhang M, McMillen D, Lee C, Jung W, Goldy J, Abdelhak A, Baker P, Barkan E, Bertagnolli D, Campos J, Carey D, Casper T, Chakka AB, Chakrabarty R, Chavan S, Chen M, Clark M, Close J, Crichton K, Daniel S, Dolbeare T, Ellingwood L, Gee J, Glandon A, Gloe J, Gould J, Gray J, Guilford N, Guzman J, Hirschstein D, Ho W, Jin K, Kroll M, Lathia K, Leon A, Long B, Maltzer Z, Martin N, McCue R, Meyerdierks E, Nguyen TN, Pham T, Rimorin C, Ruiz A, Shapovalova N, Slaughterbeck C, Sulc J, Tieu M, Torkelson A, Tung H, Cuevas NV, Wadhwani K, Ward K, Levi B, Farrell C, Thompson CL, Mufti S, Pagan CM, Kruse L, Dee N, Sunkin SM, Esposito L, Hawrylycz MJ, Waters J, Ng L, Smith KA, Tasic B, Zhuang X, Zeng H. A high-resolution transcriptomic and spatial Atlas of cell types in the whole mouse brain. bioRxiv. 2023 doi: 10.1101/2023.03.06.531121. [DOI] [PMC free article] [PubMed]
  48. Zeng H. Mesoscale connectomics. Current Opinion in Neurobiology. 2018;50:154–162. doi: 10.1016/j.conb.2018.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Zeng H. What is a cell type and how to define it? Cell. 2022;185:2739–2755. doi: 10.1016/j.cell.2022.06.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Zhang Z, Zhou J, Tan P, Pang Y, Rivkin AC, Kirchgessner MA, Williams E, Lee C-T, Liu H, Franklin AD, Miyazaki PA, Bartlett A, Aldridge AI, Vu M, Boggeman L, Fitzpatrick C, Nery JR, Castanon RG, Rashid M, Jacobs MW, Ito-Cole T, O’Connor C, Pinto-Duartec A, Dominguez B, Smith JB, Niu S-Y, Lee K-F, Jin X, Mukamel EA, Behrens MM, Ecker JR, Callaway EM. Epigenomic diversity of cortical projection neurons in the mouse brain. Nature. 2021;598:167–173. doi: 10.1038/s41586-021-03223-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Zhang M, Pan X, Jung W, Halpern AR, Eichhorn SW, Lei Z, Cohen L, Smith KA, Tasic B, Yao Z, Zeng H, Zhuang X. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature. 2023;624:343–354. doi: 10.1038/s41586-023-06808-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhu J, Cheng Q, Chen Y, Fan H, Han Z, Hou R, Chen Z, Li CT. Transient delay-period activity of agranular insular cortex controls working memory maintenance in learning novel tasks. Neuron. 2020;105:934–946. doi: 10.1016/j.neuron.2019.12.008. [DOI] [PubMed] [Google Scholar]

Editor's evaluation

Jeremy J Day 1

This manuscript describes a valuable new circuit mapping and profiling technique called Multiplexed projEction neuRons retrograde barcodE (MERGEseq) that combines transcriptome and projectome data at a single-cell resolution. The authors provide solid evidence that MERGEseq can be used to identify projection targets and cell type/layer/transcriptome differences of projection neurons in the mouse prefrontal cortex, and validation experiments are rigorous. While this report is a proof-of-principle that MERGEseq is useful for circuit mapping and profiling and many potential details will influence conclusions, this technique could easily be adapted to other regions with known projection targets and adds to a growing arsenal of combinatorial circuit mapping and profiling tools.

Decision letter

Editor: Jeremy J Day1

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "High-throughput mapping of single-neuron projection and molecular features by retrograde barcoded labelling" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Kate Wassum as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission. Please accompany your revision with a point x point response to each point raised by the reviewers in the public and private reviews, paying special attention to the essential revisions noted below.

Essential revisions:

1) The manuscript builds upon several prior approaches, only some of which are discussed and cited. The scholarship of the manuscript needs to be improved by incorporating a more robust review of other approaches, as suggested by reviewers to place these findings in context. The scholarship of the manuscript should also be improved by including a more transparent discussion of the limitations of MERGE-seq relative to other approaches.

2) Reviewers raised several issues related to the efficiency of retrograde labeling and barcode recovery that have the potential to affect the conclusions reached in the manuscript. These issues should be addressed in the manuscript with new data and/or analysis.

3) A revision should incorporate requested information to provide clarity about the experimental details and analysis methods.

4) Please ensure your manuscript complies with the eLife policies for statistical reporting: https://reviewer.elifesciences.org/author-guide/full "Report summary statistics (e.g., t, F values) and degrees of freedom, exact p-values, and 95% confidence intervals wherever possible. These should be reported for all key questions and not only when the p-value is less than 0.05.

5) Please include a key resource table.

Reviewer #1 (Recommendations for the authors):

1. The introduction discusses current techniques like BARseq but makes no mention of the current retrograde tracing and sequencing techniques which is what MERGEseq is actually improving upon. While these retro techniques are discussed in detail in the discussion, it seems odd that they are left out of the introduction as MERGEseq is really an extension of these techniques rather than MAPseq/BARseq.

2. Figure 1 – Supplement 1 shows that very different UMI cutoffs were used to call a cell "positive" for each barcode index. This would presumably preclude direct comparison across regions with different barcodes, at least for purposes of determining the density of projections (since a different degree of projections will be excluded for each virus region). This should be mentioned more explicitly in the manuscript.

3. Lines 170-171 note that "stressed" 637 neurons were filtered out from the Slc17a7 population. It is not clear what this means, and Figure 2 – Supplement 1A does not explain this. This should be clarified.

4. In the discussion, authors might comment on the added value of capturing the "full" transcriptome at the cost of spatial resolution.

5. It may be good to mention that retro-AAV2 has not been reported to infect fibers of passage.

6. Please include a color scale for Figure 5a.

7. Please include axis labels for Figure 5c.

8. Figure 5F shows the DMS+LH Pou3f+ cells, what about the DMS only and LH only? Are there any Pou3f+ cells that lack fluorescence?

9. Please enlarge Figure 3e.

10. For scRNA-seq, please provide detail on the depth of sequencing, the number of GEMwells used, and the number of technical or biological replicates.

11. Please include a color scale for Figure 6e-f.

12. Typo, line 536 "despite of chosen"; should be "in spite of chosen" or "despite chosen".

13. Supplemental Figure 2 panel E, missing cell type labels going across.

14. Please cite the original retro seq paper (Tasic, 2018, PMCID: PMC6456269).

Reviewer #2 (Recommendations for the authors):

Based on their detection of barcodes in single-cell RNA-sequencing, the authors conclude that "about 74% of barcoded vmPFC neurons projected to one of these five targets (dedicated projection) and 26% of barcoded vmPFC neurons sent collateral projections to multiple brain regions…" (lines 92-94 in Introduction, lines 242-244 in Results). These conclusions are contingent upon 100% of neurons that project to a specific region being labeled by retrograde barcoded viruses and barcodes are detected at 100% efficiency. The authors did not provide an estimation of either efficiency. In the penultimate paragraph of the Discussion, the authors raised this as "A potential technical concern" but conclude that "the overall dedicated and collateral projection pattern…will not be greatly affected by the labeling efficiency or recovery rate."

I disagree with the authors' conclusion. Suppose that retrograde labeling efficiency is 70%, and the barcode recovery rate is also 70% (both very optimistic estimates). Suppose further that all neurons of a particular type send collateral branches to two target regions, X and Y. The experiment will yield the results that 25% of the neurons will be labeled by barcodes injected at X only, 25% by barcodes injected at Y only, 25% labeled by both barcodes, and 25% labeled by neither. The conclusion from the above experiment would be that 2/3 of neurons are "dedicated" to either X or Y, and 1/3 of the neurons send axons to both regions. This simple back-of-the-envelope calculation reveals how much collateralization is underestimated by incomplete retrograde labeling!

Without reading the penultimate Discussion paragraph, readers will be misled twice about the fraction of "dedicated projection." Even after reading it, the readers will still be misled by the authors' conclusions.

If the authors wish to make a quantitative conclusion about the true "dedicated" vs. "collateral" projections, they must determine the efficiency of retrograde barcoding. They can inject AAVretro carrying two different types of barcodes into the same region (via two separate injections, rather than injecting a mixture, which will artificially raise the co-transduction efficiency) and quantify individual cells that are labeled by both types of barcodes. They can then use such efficiency to calibrate their estimation of "dedicated" vs. "collateral" projections. (Note that retrograde labeling efficiency may differ for different sites.) Without such calibration, the authors should caution the readers about the (likely large) underestimation of true collateralization whenever such data are presented and discussed.

Another issue with "dedicated" projections: the authors only examined 5 targets. Each of the "dedicated" projections is true within these 5 targets, these neurons can send collateralized axons to other, unstudied targets.

Reviewer #3 (Recommendations for the authors):

1) Please provide a better context for the presented method. Clarify how the transcriptomic cell type definition in this paper corresponds to previous papers and clarify how the presented method differs from previous methods and what the advantages and disadvantages are. Please cite other papers that have employed single-cell Retro-seq: Tasic et al. 2016 (https://doi.org/10.1038/nn.4216), Tasic et al. 2018 (https://doi.org/10.1038/s41586-018-0654-5) , Yao et al. 2021, https://doi.org/10.1016/j.cell.2021.04.021, Zhang et al. 2021 https://doi.org/10.1038/s41586-021-03223-w )

2) Line 171: How were 'stressed' neurons defined? Please explain.

3) Please mention which 10x Genomics chemistry version (e.g., v2, v3, v3.1) you use in the main text and criteria for QC (lowest acceptable UMI or gene detection level, as well as median gene detection for different cell classes: excitatory, inhibitory and non-neuronal).

4) Figure 1E: The figure shows a correlation between two studies at a resolution that is not appropriate – too low to be informative except for QC. Please move this to supplement.

5) Cell type identity definition: I suggest performing data integration (for example using Seurat) with Bhattacherjee et al., 2019, Liu et al. 2021 and with Yao et al. 2021 (see above) to give more updated names to cell types. The paper should start with cell type definition and present consistent nomenclature from the beginning.

6) Caution should be exercised when interpreting under-represented cell types in MERGE-seq. Figure 2 shows that only 8 neurons of the L5-Htr2c subtype were validly barcoded. However, in addition to the possibility that this subtype does not project to the five targets included in this study, the small number of barcoded L5-Htr2c subtype could also be caused by the tropism of AAV2-Retro viruses, or the selective loss of L5-Htr2c neurons in tissue processing due to cell death. Comparison with the in situ hybridization patterns of marker genes in Supplementary Figure 2 and the proportions of neuronal subtypes in Figure 2b suggests bias in sampling of cell subtypes by scRNA-seq. Therefore, independent approaches should be utilized to further investigate the projection targets of L5-Htr2c neurons before reaching a conclusion.

7) We suggest replacing "unbarcoded" with "non-barcoded". "Un-barcoded" sounds like the cells were barcoded and then the barcode was removed. The more appropriate term is "non-barcoded".

8) Figure 2 (and others): Please state how many cells are represented in each panel of the figure.

9) Figure 2 E and F – Not the most straightforward and informative representation: We suggest converting these to bar plots per area and per type. It is good to see that the authors kept the colors introduced in this figure in Figure 3. We suggest wherever the color code can be kept consistent, to do so.

10) Figure 3. MERGE-seq reveals hidden projection diversity within the vmPFC – please remove 'hidden'.

11) Figure 6E/F – How many neurons and which types are shown? Every figure should state which single-cell transcriptomes were included and the labeling should be consistent with the previous figures. For example, please show the PC1/PC2 scatter plot in E and F next to the same cells labeled with their cell type assignments + colors. This allows the reader to connect the information from previous figures to these.

12) Line 490/491 Please include these references when referring to Retro-seq: Tasic et al. 2016 (https://doi.org/10.1038/nn.4216), Tasic et al. 2018 (https://doi.org/10.1038/s41586-018-0654-5), Yao et al. 2021, https://doi.org/10.1016/j.cell.2021.04.021, Zhang et al. 2021 https://doi.org/10.1038/s41586-021-03223-w )

13) Completeness and sensitivity of barcode recovery of the approach: The authors recovered 1791 EGFP-positive cells undergoing fluorescence-activated cell sorting (FACS) from three mice and 19,470 single cells without sorting from the other three mice. Using thresholds calculated based on barcode counts in non-neurons, they found that the percentage of barcoded cells in FACS sorted or unsorted groups are 54% and 12%, respectively. Given that almost all cells sorted by FACS should have been infected with the barcoded GFP AAV viruses, the recovery rate for barcodes is low. Therefore, labeling neurons as barcoded and unbarcoded based on the detection of barcodes will create false negatives, that is, classifying many retrograded labeled neurons as "unbarcoded" (we suggest changing this to "non-barcoded"). It seems that the projectomes of neurons cannot be simply derived from barcode detection through sequencing. Many "unbarcoded" projection neurons were in fact retrogradely labeled by AAV viruses injected into a specific target but were negative for barcodes due to technical limitations.

One immediate issue caused by the false negative rate of barcode detection is whether the machine learning-based modeling is provided with the right training data (Figure 6). For this model to predict projectomes based on transcriptomes, it first requires a good correlation between barcoding and projectome.

More discussion should be given to the recovery rate of barcodes, and its potential impact on data analysis.

14) Additional analysis and control experiments related to barcode detection:

The low barcode recovery rate could be due to the low number of copies for AAV-encoded transcripts in the transcriptome of single cells, or it is specific for the detection of short barcode sequences. With the current scRNAseq data, one additional analysis is to measure the percentage of neurons positive for GFP transcripts from both FAC-sorted and non-sorted samples and compare the GFP+ neuron frequency to barcode+ neuron frequency.

15) To enrich cDNA fragments composed of the barcode index, unique molecular identifiers (UMIs), and the barcode, the authors prepared expressed virus barcode libraries with special primers. The authors also tried to detect barcodes directly in single-cell transcriptional libraries. What was the barcode detection frequency without this additional amplification, that is, in regular single-cell transcriptional libraries? It would be good to comment on how much this approach (we assume) improves barcode detection compared to the regular 10x single-cell libraries without additional barcode amplification.

16) The authors hypothesized that the non-neuronal cells would not be transduced by rAAV2-retro. The barcode counts in these non-neuronal cells were used to generate the thresholds for projection neurons. However, such cells, especially microglia, could be positive for AAV transcripts perhaps by phagocytosing dying infected neurons. A better control cell population could be cells negative for GFP after FACS. If sequencing data are available for such GFP-negative cells, it would be useful to examine the detection of barcodes in these cells and use them as the negative control for thresholding.

17) In the section on Projection barcode FASTQ alignment, the authors stated that deMULTIplex R package (v1.0.2) (https://github.com/chris-mcginnis-ucsf/MULTI-seq) was used to count UMIs associated with barcodes. This method was designed to detect the Sample index and needs to be further adjusted for barcode reading. The current method did not reveal the fact that a single UMI could be associated with multiple barcodes, which could raise the need for thresholding at this stage.

MULTIseq.preProcess was used to identify the barcode sequence based on its position relative to the P7 primer. The result is a readTable with each row showing the Cell ID, UMI, and barcode sequence. The same UMI appeared in multiple rows and could have different barcodes.

MULTIseq.align function was used to match the barcode sequence in each row of readTable to the 5 barcodes, and to find the numbers of UMI associated with each barcode in each sample. This function utilizes a minimal Hamming distance of 1 to call a match between barcodes detected in the sequencing samples and the list of designed barcodes. We would suggest a minimal hamming distance of 2. Many of the sequences with a minimal Hamming distance of 2 have a frameshift of 1 nucleotide as compared to the designed barcodes. If the parameter of MULTIseq.preProcess is adjusted to change the position of the expected barcode, we would expect to find the full-length barcode. A specific example is:

"AAGGCACAGACTTTG" has a Hamming distance of 2 as compared to barcode 2 "GAAGGCACAGACTTT", and should also be considered as a match.

More importantly, MULTIseq.align does not consider the complication that multiple barcodes could be detected for the same UMI, and simply uses the barcode of the first duplicated UMI. Taking the FAC-sorted pfc_4 dataset as an example, the readTable generated using this dataset contains 30272582 rows. The third column represents sequences detected at the specified barcode position.

After aligning to the 5 barcodes with a max hamming distance of two, 26443131 of the sequences detected at the specified barcode position can be matched, leading to 474012 unique combinations of Cell/UMI, each combination with a set of detected barcodes.

Many of the UMIs are associated with multiple barcodes. In the pfc_4 dataset, 68955 of the 474012 unique combinations of Cell/UMI are associated with multiple barcodes (14.5%).

When we used the data above to compare the density distributions of barcode counts per UMI per cell for the pfc_4 dataset and that of non-neuronal cells, we find that low barcode counts may not be specific. The best negative control here would be to use GFP-negative cells after FACS.

Depending on the threshold values to eliminate these false barcode counts, we reached an even smaller number of barcoded neurons at the end of the analysis.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "High-throughput mapping of single-neuron projection and molecular features by retrograde barcoded labeling" for further consideration by eLife. Your revised article has been evaluated by Kate Wassum (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

The reviewers have outlined a couple of additional changes that are needed to fully respond to the prior critiques. These represent small changes in the text and, thus, should not present a significant difficulty. Please address these comments in a revised manuscript.

Reviewer #2 (Recommendations for the authors):

The revised manuscript has improved. However, the authors still did not address my concern about determining "dedicated" vs. "collateralized" projections. The authors put those numbers in the Introduction and Results without doing the control experiment I suggested (to determine retrograde labeling efficiency) and without mentioning the caveats. The caveat is only mentioned in Discussion (lines 499-500). Readers who missed this sentence will be misled.

Adding a comparison between MERGE-seq and fMOST tracing (the new Figure 3C) is an improvement. As can be seen from the graph, there is a higher percentage of collateralized axons in fMOST compared to MERGE-seq in multiple categories, confirming that MERGE-seq underestimates the fraction of collateralized axons (though to my relief there is not an order-of-magnitude difference).

The authors should add the number of neurons included in the fMOST dataset in the Figure 3C legend.

Reviewer #4 (Recommendations for the authors):

This manuscript introduces MERGE-seq, a multiplexed method for profiling transcriptional features of individual neurons projecting to specific targets. The approach involves multiplexed retrograde tracing by injecting distinctly barcoded rAAV-retro viruses into different target areas, followed by scRNAseq of neurons in the source area on the 10xGenomics platform. The projection targets of barcoded neurons in the source area can be inferred by matching the detected barcodes to the barcode sequences to of rAAV-retro viruses injected into the target areas.

Validation of this approach was conducted by injecting rAAVs carrying five distinct 15-nt barcodes to five known ventromedial prefrontal cortex (vmPFC) targets. This revised version has performed integration analysis with previously existing vmPFC scRNA-seq and MERFISH dataset, and compared vmPFC scRNA clusters and the 7 excitatory neuron subtypes analyzed in this study with those in prior datasets. MERGEseq facilitated the identification of vmPFC cell types projecting to distinct areas, revealing that each of the seven identified excitatory neuron subtypes projects to multiple targets, and the five targets receive projections from multiple transcriptomic types. MERGE-seq derived projection patterns were validated through dual-color retro-AAV tracing and were correlated successfully with fMOST-based single-neuron tracing data. Additionally, marker genes for projection-specific cell subclasses were validated in retrogradely labeled vmPFC using RNA FISH for marker detection.

This revised version has effectively tackled the previously raised concerns. Significant efforts have been dedicated to performing an integrated analysis with existing datasets, enhancing the data analysis methodology, and imposing more stringent criteria for barcode determination. The revised manuscript places greater emphasis on acknowledging and incorporating several prior approaches that influenced the development of the MERGE-seq concept. While the efficiency of retrograde barcoding wasn't experimentally addressed by injecting rAAV-retro viruses with different barcodes into the same region, the limitations and potential concerns of MERGE-seq are now explicitly discussed. Additionally, the revised manuscript provides clarity on essential technical aspects, including QC criteria and parameters for evaluating scRNA data quality. In sum, this manuscript is rigorous and thorough, offering a valuable approach for the multiplexed investigation of neuronal transcriptomics and projection targets.

In addition, I suggest that QC criteria should be explicitly listed in the main text. The number of cells passing each QC step should also be listed either in the main text or in the related figures. My understanding is that there is a general QC step for scRNAseq quality based on gene count, total UMI count, and mitochondrial gene expression and that there is another step to identify low-quality cells and contaminated non-neuron cells. It would be very helpful that such information is readily available in the main text.

eLife. 2024 Feb 23;13:e85419. doi: 10.7554/eLife.85419.sa2

Author response


Essential revisions:

1) The manuscript builds upon several prior approaches, only some of which are discussed and cited. The scholarship of the manuscript needs to be improved by incorporating a more robust review of other approaches, as suggested by reviewers to place these findings in context. The scholarship of the manuscript should also be improved by including a more transparent discussion of the limitations of MERGE-seq relative to other approaches.

We have substantially revised the Introduction to include explicit comparisons of our method to current approaches in neuronal projectome mapping. Additionally, we have expanded the Discussion to transparently address limitations and potential concerns related to our approach, as well as suggested solutions. Throughout, we have ensured proper scholarly attribution by adding citations acknowledging foundational work and related methods developed by others in this emerging field.

2) Reviewers raised several issues related to the efficiency of retrograde labeling and barcode recovery that have the potential to affect the conclusions reached in the manuscript. These issues should be addressed in the manuscript with new data and/or analysis.

We recognize the inherent challenges associated with imperfect retrograde labeling efficiency in retro-seq-based methods, acknowledging that this efficiency can vary across different anatomical sites, which complicates the quantitative determination of projection patterns. In light of this, we have carefully revised the presentation of our results and discussion in the manuscript to explicitly caution readers about drawing quantitative conclusions.

We suggest that integrating single-molecule imaging techniques, such as MERFISH, with MERGE-seq could potentially provide spatially-resolved and quantitatively-enhanced projection pattern data. Regarding barcode recovery, we have incorporated new computational analyses and established more stringent recovery criteria, supplemented by negative controls, to enhance the accuracy of our barcode detection.

Furthermore, we have openly discussed the difficulties in accurately identifying valid projection barcodes, given the imperfect nature of retrograde labeling efficiency, variable recovery rates, and the distinct projection strengths associated with different target regions. Our Discussion section has been expanded to clearly communicate how these technical factors could impact the interpretation of our data, ensuring a more transparent presentation of our study's limitations and findings.

3) A revision should incorporate requested information to provide clarity about the experimental details and analysis methods.

We have enriched experimental and computational details in Methods section. We also made our computational code fully available in Github.

4) Please ensure your manuscript complies with the eLife policies for statistical reporting: https://reviewer.elifesciences.org/author-guide/full "Report summary statistics (e.g., t, F values) and degrees of freedom, exact p-values, and 95% confidence intervals wherever possible. These should be reported for all key questions and not only when the p-value is less than 0.05.

We have included exact p-values and 95% confidence intervals in source data files for corresponding figures where statistical analysis was performed (Figure 3D, Figure 6B, and Figure 6—figure supplement 1B), which contain the full summary statistics.

5) Please include a key resource table.

We have included key resource table at the start of Methods section.

Reviewer #1 (Recommendations for the authors):

1. The introduction discusses current techniques like BARseq but makes no mention of the current retrograde tracing and sequencing techniques which is what MERGEseq is actually improving upon. While these retro techniques are discussed in detail in the discussion, it seems odd that they are left out of the introduction as MERGEseq is really an extension of these techniques rather than MAPseq/BARseq.

We have rewritten the Introduction section, with an explicit description of previous methods. In the revised introduction, we reiterate the technological gap in currently existing methods of mapping neuronal projections, including MAPseq (Kebschull et al., Neuron, 2016), BARseq (Chen et al., Cell, 2019; Sun et al., Nature Neuroscience, 2021), ConnectID (Klingler et al., Nature, 2021), BRICseq (Huang et al., Cell, 2020), VECTORseq (Retro-seq-based) (Cheung et al., Cell Rep., 2022). As recommended, we have expanded the Methods to include an in-depth comparison of our approach to other Retro-seq technologies such as Lui et al., 2021, highlighting both technological differences and distinct biological insights. Throughout the revised manuscript, we have also ensured scholarly attribution by incorporating additional references to foundational work and related methods from across this exciting field.

2. Figure 1 – Supplement 1 shows that very different UMI cutoffs were used to call a cell "positive" for each barcode index. This would presumably preclude direct comparison across regions with different barcodes, at least for purposes of determining the density of projections (since a different degree of projections will be excluded for each virus region). This should be mentioned more explicitly in the manuscript.

Downstream target regions can differentially affect projection barcode recovery due to factors including innervation density, projection range, and connection strength. We apply strict UMI thresholds per target calculated based on FAC-sorted neuron barcode quantiles to control for these effects. As suggested, we now explicitly state the distinct UMI cutoffs used for each validated projection and have expanded the Methods to provide full details on our binary projection classification approach (revised Methods, Binary projection pattern classification section).

In the main text of the revised manuscript (line 157), “It is worth mentioning that the UMI threshold differs for different targets due to different magnitude of barcode expression of each projection target (Figure 1—figure supplement 1F).

3. Lines 170-171 note that "stressed" 637 neurons were filtered out from the Slc17a7 population. It is not clear what this means, and Figure 2 – Supplement 1A does not explain this. This should be clarified.

First, we clustered all cells detected by scRNA-seq to generate major cell type classification, i.e., excitatory neurons, inhibitory neurons, astrocytes, oligodendrocytes, endothelial cells, and microglia. Then we use the annotated “Excitatory neuron” cluster to further cluster excitatory neuronal subtypes. Based on the 2nd round clustering result, we found several clusters expressed a lower number of counts per cell, a lower number of genes per cell, a higher percentage of mitochondria genes, and ribosome protein genes as differentially expressed genes, which indicates cell clusters with low cell quality (Ilicic et al., Genome Biology, 2016). We also found several other clusters with a small number of cells expressing typical markers of non-neuron cells, such as microglia (C1qa, C1qb) oligodendrocytes (Olig1, Olig2) and endothelial cells (Flt1, Cldn5), which indicates “contamination” of other cell types mixed in “Excitatory neuron” in the initial clustering results. Based on these analyses, we filtered out those cells from “Excitatory neuron” cluster and redid clustering to generate excitatory neuronal subtypes.

We provided a supplementable table (supplementary file 1) containing the marker genes of 2nd round clustering before filtering out 637 cells in the revised manuscript. We improved the main text for clarity by changing “stressed neurons” to “low-quality cells and contaminated non-neuron cells” and included this detailed analysis description in the revised manuscript.

4. In the discussion, authors might comment on the added value of capturing the "full" transcriptome at the cost of spatial resolution.

We believe that our current methodology has the advantages below. First, we use a commercialized scRNA-seq technology (10X Genomics), which does not require a specialized technique or equipment setup and is cost-inexpensive. Second, there are many elaborate published spatially-resolved mouse brain atlases generated by Allen Brain Institute, Xiaowei Zhuang’s lab, and many others (Zhang et al., Biorxiv, 2023; Yao et al., Biorxiv, 2023; Allen et al., Cell, 2023). Using differentially expressed genes detected by scRNA-seq 10X Genomics, one can easily annotate the cell population spatially. For example, we performed a comparison analysis of our scRNA-seq data with recently published MERFISH data of PFC (Bhattacherjee et al., Nature Neuroscience, 2023), which confidently provides us with the spatial information of neuronal subtypes (revised Figure 2D). Third, current spatially-resolved single-cell transcriptomics technologies are either imaging-based (MERSCOPE) or next-generation sequencing (NGS)-based (Deng et al., Nature Reviews Bioengineering, 2023). Imaging-based spatial transcriptomics enables single-cell or subcellular resolution, however, is currently cost-expensive, limited by the number of genes detected, and biased to a preselected gene panel. On the other hand, NGS-based spatial transcriptomics enables unbiased gene detection, however, many NGS-based spatial technologies do not support the authentic single-cell resolution (e.g., 50um 10x Genomics Visium or 10-50um for DBiT-seq-based methods) (Deng et al., 2023, Nature Reviews Bioengineering). With all these technological considerations, we believe our method stands out as a robust solution and offers an advantageous balance between resolution and scope.

We have included this part of the discussion in the Discussion part of the revised manuscript (line 473-482).

5. It may be good to mention that retro-AAV2 has not been reported to infect fibers of passage.

We have mentioned this point in the discussion part (line 520). “Further, it has been experimentally verified that AAV2 spread is confined to the vicinity of synaptic terminals and does not affect axon fibers in passages, especially as evidenced by retro-AAV injections in the cervical spinal cord (Wang et al., 2018). ”

6. Please include a color scale for Figure 5a.

Done.

7. Please include axis labels for Figure 5c.

Done.

8. Figure 5F shows the DMS+LH Pou3f+ cells, what about the DMS only and LH only? Are there any Pou3f+ cells that lack fluorescence?

We re-do the statistical analysis to include the percentage of Pou3f1+EGFP−tdT−, Pou3f1+EGFP−tdT+, Pou3f1+EGFP+tdT−, and Pou3f1+EGFP+tdT+ cells in Pou3f1+ cells. EGFP+ cells represent neurons projecting to DMS and tdT+ cells represent neurons projecting to LH. Among Pou3f1+ cells, we found that there are about 31.6% ± 13.1% Pou3f1+EGFP−tdT+ cells, 8.89% ± 2.38% Pou3f1+EGFP+tdT− cells, 55.7% ± 10.4% Pou3f1+EGFP+tdT+ cells, and 3.79% ± 2.91% Pou3f1+EGFP-tdT− cells. This indicates Pou3f1-expressing vmPFC neurons preferentially project to LH and DMS+LH than DMS. We have included this analysis in the revised manuscript (revised Figure 5D-G).

9. Please enlarge Figure 3e.

Done.

10. For scRNA-seq, please provide detail on the depth of sequencing, the number of GEMwells used, and the number of technical or biological replicates.

For unsorted samples, we used 3 mice with three GEM wells in one Chromium Single Cell 3' Chip (v3). Among unsorted samples, sample mouse 1 recovered 8040 cells, 447,984,945 read pairs were aligned, mean reads per cell is 55,719, median genes per cell is 2382; sample mouse 2 recovered 7443 cells, 399,187,134 read pairs were aligned, mean reads per cell is 53,632, median genes per cell is 2379; sample mouse 3 recovered 7243 cells, 410,627,696 read pairs were aligned, mean reads per cell is 56,693, median genes per cell is 2385. For FAC-sorted samples, we used 3 mice with one GEM well in one Chromium Single Cell 3' Chip (v3). FAC-sorted sample recovered 2075 cells, 410,434,792 read pairs were aligned, mean reads per cell is 197,799, median genes per cell is 6533. We have included these details in Methods section in the revised manuscript.

11. Please include a color scale for Figure 6e-f.

We have reorganized the Figure 6. We have added color scale to Figure 6D (previous Figure 6E) and 6F (previous Figure 6F).

12. Typo, line 536 "despite of chosen"; should be "in spite of chosen" or "despite chosen".

We have changed this typo in the revised manuscript.

13. Supplemental Figure 2 panel E, missing cell type labels going across.

Done.

14. Please cite the original retro seq paper (Tasic, 2018, PMCID: PMC6456269).

We have added this citation to the revised manuscript.

Reviewer #2 (Recommendations for the authors):

Based on their detection of barcodes in single-cell RNA-sequencing, the authors conclude that "about 74% of barcoded vmPFC neurons projected to one of these five targets (dedicated projection) and 26% of barcoded vmPFC neurons sent collateral projections to multiple brain regions…" (lines 92-94 in Introduction, lines 242-244 in Results). These conclusions are contingent upon 100% of neurons that project to a specific region being labeled by retrograde barcoded viruses and barcodes are detected at 100% efficiency. The authors did not provide an estimation of either efficiency. In the penultimate paragraph of the Discussion, the authors raised this as "A potential technical concern" but conclude that "the overall dedicated and collateral projection pattern…will not be greatly affected by the labeling efficiency or recovery rate."

I disagree with the authors' conclusion. Suppose that retrograde labeling efficiency is 70%, and the barcode recovery rate is also 70% (both very optimistic estimates). Suppose further that all neurons of a particular type send collateral branches to two target regions, X and Y. The experiment will yield the results that 25% of the neurons will be labeled by barcodes injected at X only, 25% by barcodes injected at Y only, 25% labeled by both barcodes, and 25% labeled by neither. The conclusion from the above experiment would be that 2/3 of neurons are "dedicated" to either X or Y, and 1/3 of the neurons send axons to both regions. This simple back-of-the-envelope calculation reveals how much collateralization is underestimated by incomplete retrograde labeling!

Without reading the penultimate Discussion paragraph, readers will be misled twice about the fraction of "dedicated projection." Even after reading it, the readers will still be misled by the authors' conclusions.

If the authors wish to make a quantitative conclusion about the true "dedicated" vs. "collateral" projections, they must determine the efficiency of retrograde barcoding. They can inject AAVretro carrying two different types of barcodes into the same region (via two separate injections, rather than injecting a mixture, which will artificially raise the co-transduction efficiency) and quantify individual cells that are labeled by both types of barcodes. They can then use such efficiency to calibrate their estimation of "dedicated" vs. "collateral" projections. (Note that retrograde labeling efficiency may differ for different sites.) Without such calibration, the authors should caution the readers about the (likely large) underestimation of true collateralization whenever such data are presented and discussed.

We thank reviewer for the constructive suggestion concerning the efficiency of retrograde barcoding in our study. We understand the critical importance of accurately determining the efficiency of retrograde barcoding and acknowledge the challenges presented by varying labeling efficiencies across different sites. Our analysis has focused on cells identified through a stringent barcoding threshold, ensuring reliable data, yet we acknowledge that this approach may not represent the full diversity of neuronal projection patterns.

In response, we have amended the manuscript to more clearly highlight the potential underestimation of collateral projections due to incomplete labeling (line 500-501). This cautionary note emphasizes that our quantitative conclusions might not fully capture the true breadth of neuronal projections. We aim to ensure that readers are thoroughly informed about this limitation and the potential implications for interpreting our data. These revisions have been incorporated into both the Results and Discussion sections to avert any misinterpretation of our study's outcomes. We also propose to combine single-molecule imaging techniques (e.g., MERFISH) with MERGE-seq in the future to generate spatially-resolved and quantitively-enhanced projection patterns (line 517-518). Finally, we provided additional joint analysis with fMOST-based single-neuron projectome data (Gao et al., 2022, Nature Neuroscience) to further validate the projection patterns (> = 3 targets) that cannot be easily validated with dual-color retro-AAV tracing.

Another issue with "dedicated" projections: the authors only examined 5 targets. Each of the "dedicated" projections is true within these 5 targets, these neurons can send collateralized axons to other, unstudied targets.

We have added the claim in the revised manuscript (line 101, 294) claiming the “dedicated” projections are only defined within the five targets examined in this study, and we cannot rule out the possibility of collateral projections of “dedicated” projections defined in this study due to the limited targets we investigated.

Reviewer #3 (Recommendations for the authors):

1) Please provide a better context for the presented method. Clarify how the transcriptomic cell type definition in this paper corresponds to previous papers and clarify how the presented method differs from previous methods and what the advantages and disadvantages are. Please cite other papers that have employed single-cell Retro-seq: Tasic et al. 2016 (https://doi.org/10.1038/nn.4216), Tasic et al. 2018 (https://doi.org/10.1038/s41586-018-0654-5) , Yao et al. 2021, https://doi.org/10.1016/j.cell.2021.04.021, Zhang et al. 2021 https://doi.org/10.1038/s41586-021-03223-w )

We have rewritten the discussion part, with a thorough comparison between MERGE-seq and previous methods. We have cited these papers in the revised manuscript.

2) Line 171: How were 'stressed' neurons defined? Please explain.

First, we clustered all cells detected by scRNA-seq to generate major cell type classification, i.e., excitatory neurons, inhibitory neurons, astrocytes, oligodendrocytes, endothelial cells, and microglia. Then we use the annotated “Excitatory neuron” cluster to further cluster excitatory neuronal subtypes. Based on the 2nd round clustering result, we found several clusters expressed a lower number of counts per cell, a lower number of genes per cell, a higher percentage of mitochondria genes and ribosome protein genes as differentially expressed genes, which indicates cell clusters with low cell quality (Ilicic et al., Genome Biology, 2016). We also found several other clusters with a small number of cells expressing typical markers of non-neuron cells, such as microglia (C1qa, C1qb) oligodendrocytes (Olig1, Olig2) and endothelial cells (Flt1, Cldn5), which indicates “contamination” of other cell types mixed in “Excitatory neuron” in the initial clustering results. Based on these analyses, we filtered out those cells from the “Excitatory neuron” cluster and redid clustering to generate excitatory neuronal subtypes. We provided a supplementable tale (revised supplementary file 1) containing the marker genes of 2nd round clustering before filtering out 637 cells in the revised manuscript. We improved the main text for clarity by changing “stressed neurons” to “low-quality cells and contaminated non-neuron cells” and included this detailed analysis description in the revised manuscript.

3) Please mention which 10x Genomics chemistry version (e.g., v2, v3, v3.1) you use in the main text and criteria for QC (lowest acceptable UMI or gene detection level, as well as median gene detection for different cell classes: excitatory, inhibitory and non-neuronal).

We used Chromium Single Cell 3' Reagent Kit (v3) in this study (Cat#PN1000075). We retained cells with a gene count between 500 and 8000, a total UMI count between 1,000 and 60,000, and with less than 20% mitochondrial gene expression, ensuring single-cell data quality and the exclusion of potential outliers. We added the 10x Genomics chemistry version information in the revised Methods part and included median gene detection metrics for different major cell types in the revised supplementary file 4.

4) Figure 1E: The figure shows a correlation between two studies at a resolution that is not appropriate – too low to be informative except for QC. Please move this to supplement.

We have implemented the analytical comparison approach from Bhattacherjee et al., 2023 to evaluate the replication of major cell types between our scRNA-seq data and publicly available spatial references. We replaced the prior analysis in Figure 1E with this improved comparative analysis, now shown in revised Figure 1E for consistency. We consider that it is equally important to demonstrate correspondence across both major classes and excitatory neuronal subtypes.

5) Cell type identity definition: I suggest performing data integration (for example using Seurat) with Bhattacherjee et al., 2019, Liu et al. 2021 and with Yao et al. 2021 (see above) to give more updated names to cell types. The paper should start with cell type definition and present consistent nomenclature from the beginning.

In the revised manuscript, we first performed a comparison analysis of our scRNA-seq data with recently published MERFISH data of PFC (Bhattacherjee et al., 2023, Nature Neuroscience), which confidently provides us with the spatial information of neuronal subtypes (revised Figure 2D). Then we also compare our dataset with the exciatory neuronal subtypes datasets of Bhattacherjee et al., 2019, Lui et al., 2021 and Yao at al., 2021. One thing that is worth mentioning is Lui et al., 2021 sequenced Rbp4cre+ neurons, which are most Layer 5 excitatory projection neurons. Based on the correspondence matrix generated based on MERFISH data and our scRNA-seq data, we have mentioned this spatial nomenclature in the corresponding main text. However, we kept our original annotation to keep the authenticity of the data since we do not have the spatial transcriptomic data. We have included these changes in the revised Figure 2D and revised Figure 2—figure supplement 1D.

6) Caution should be exercised when interpreting under-represented cell types in MERGE-seq. Figure 2 shows that only 8 neurons of the L5-Htr2c subtype were validly barcoded. However, in addition to the possibility that this subtype does not project to the five targets included in this study, the small number of barcoded L5-Htr2c subtype could also be caused by the tropism of AAV2-Retro viruses, or the selective loss of L5-Htr2c neurons in tissue processing due to cell death. Comparison with the in situ hybridization patterns of marker genes in Supplementary Figure 2 and the proportions of neuronal subtypes in Figure 2b suggests bias in sampling of cell subtypes by scRNA-seq. Therefore, independent approaches should be utilized to further investigate the projection targets of L5-Htr2c neurons before reaching a conclusion.

We agree that a small number of barcoded L5-Htr2c neurons could be due to cell loss during single-cell dissociation or tropism selection of AAV2-retro. We have explicitly mentioned this in the revised manuscript (line 202-204) and discussed this in the Discussion part (line 488-495). As suggested by comment 17, we refined our threshold selection criteria and set the Hamming distance threshold to 2. Under the new threshold, we found 9 L5-Htr2c neurons were determined as barcoded.

line 202-204: “Only 9 neurons of the L5-Htr2c subtype were recovered with valid barcodes, which may be attributable to technical factors including cell loss during dissociation or AAV2-retro tropism. Alternatively, this subtype may intrinsically lack projections to the selected target regions examined in this study.”

line 488-495: “For example, only 9 neurons of the L5-Htr2c subtype were recovered with valid barcodes, which may be attributable to technical factors including cell loss during dissociation or AAV2-retro tropism. Alternatively, this subtype may intrinsically lack projections to the selected target regions examined in this study. Furthermore, single-cell dissociation for scRNA-seq can result in cell loss, thereby reducing the recovery rate of barcoded neurons. All these factors could influence the extent to which the complete range of neuronal projections is captured. Consequently, the quantitative conclusions drawn here might not fully represent the true extent of neuronal projections.”

7) We suggest replacing "unbarcoded" with "non-barcoded". "Un-barcoded" sounds like the cells were barcoded and then the barcode was removed. The more appropriate term is "non-barcoded".

We have edited “unbarcoded” into “non-barcoded” across the manuscript.

8) Figure 2 (and others): Please state how many cells are represented in each panel of the figure.

We have clarified the cell number in the figure legend of Figure 1-6 in the revised manuscript.

9) Figure 2 E and F – Not the most straightforward and informative representation: We suggest converting these to bar plots per area and per type. It is good to see that the authors kept the colors introduced in this figure in Figure 3. We suggest wherever the color code can be kept consistent, to do so.

We have transformed the original result into bar plots (revised Figure 2F and 2G).

10) Figure 3. MERGE-seq reveals hidden projection diversity within the vmPFC – please remove 'hidden'.

We have removed “hidden” in the revised manuscript.

11) Figure 6E/F – How many neurons and which types are shown? Every figure should state which single-cell transcriptomes were included and the labeling should be consistent with the previous figures. For example, please show the PC1/PC2 scatter plot in E and F next to the same cells labeled with their cell type assignments + colors. This allows the reader to connect the information from previous figures to these.

As suggested by comment 8, we have added the number of neurons in the respective figures or figures legend. In the Figure 6E and 6F (now as revised Figure 6D and 6F), 1,853 barcoded neurons (top 10 frequent projection patterns) were represented. We want to suggest that at mRNA level, projection target is correlated with certain genes. Thus we did not show the cell type assignments. However, the audience can find corresponding cell type assignments and labels in revised Figure 2E, where we already indicated the distribution of neuronal subtypes for barcoded neurons associated with each projection target.

12) Line 490/491 Please include these references when referring to Retro-seq: Tasic et al. 2016 (https://doi.org/10.1038/nn.4216), Tasic et al. 2018 (https://doi.org/10.1038/s41586-018-0654-5), Yao et al. 2021, https://doi.org/10.1016/j.cell.2021.04.021, Zhang et al. 2021 https://doi.org/10.1038/s41586-021-03223-w )

We have cited these papers in the revised manuscript.

13) Completeness and sensitivity of barcode recovery of the approach: The authors recovered 1791 EGFP-positive cells undergoing fluorescence-activated cell sorting (FACS) from three mice and 19,470 single cells without sorting from the other three mice. Using thresholds calculated based on barcode counts in non-neurons, they found that the percentage of barcoded cells in FACS sorted or unsorted groups are 54% and 12%, respectively. Given that almost all cells sorted by FACS should have been infected with the barcoded GFP AAV viruses, the recovery rate for barcodes is low. Therefore, labeling neurons as barcoded and unbarcoded based on the detection of barcodes will create false negatives, that is, classifying many retrograded labeled neurons as "unbarcoded" (we suggest changing this to "non-barcoded"). It seems that the projectomes of neurons cannot be simply derived from barcode detection through sequencing. Many "unbarcoded" projection neurons were in fact retrogradely labeled by AAV viruses injected into a specific target but were negative for barcodes due to technical limitations.

One immediate issue caused by the false negative rate of barcode detection is whether the machine learning-based modeling is provided with the right training data (Figure 6). For this model to predict projectomes based on transcriptomes, it first requires a good correlation between barcoding and projectome.

More discussion should be given to the recovery rate of barcodes, and its potential impact on data analysis.

We appreciate the comment of the reviewer. To achieve a stringent categorization of barcoded cells, we set several thresholds to filter cells in order. First, we calculated the 95th percentile of the total number of unique molecular identifiers (nUMI) that are mapped with five barcodes, and removed the unusually high numbers of UMIs, which might indicate doublets or PCR-biased amplification. Next, we used two set of cells as negative control, that is, cells supposed not to contain projection barcodes. First set of negative control cells we used is non-neuronal cells classified by coarse clustering based on single-cell transcriptome. Second set of negative control cells we used is “EGFP-negative” cells in FAC-sorted dataset. Basically, we calculated the total five projection barcodes counts determined by cellranger of FAC-sorted dataset, then we assigned the cells with zero projection barcodes counts as “EGFP-negative” cells. For two set of negative control cells, we searched for the value in the empirical cumulative distribution function (ECDF) that is closest to the 99.9th percentile agains each projection barcode, respectively. We selected the higher UMI threshold from the two given sets of threshold values. Next, a cell is determined to be validly barcoded if the number of the barcode UMIs within the cell is larger than the threshold. For example, the calculated threshold of UMIs for barcode 0 (AI) is 28, which means if a cell contains more than 28 UMIs of barcode 0, then this cell is validly barcoded by AI. UMIs threshold for DMS, 101; for MD, 114; for BLA, 35; for LH, 103. It is worth mentioning that the UMI threshold differs for different targets due to different magnitude of barcode expression of each projection target (Figure 1—figure supplement 1I).

To validate barcode/non-barcode label integrity for machine learning, we performed 100 iterations randomly sampling 1000 cells and swapping labels between barcoded/non-barcoded groups. Prediction accuracy, AUC, and F1 scores of original models using the top 50 HVGs with true labels were compared to models with swapped labels. In each of the 100 trials, 1000 of the 8210 total cells were sampled and barcoded/non-barcoded labels swapped as extensively as group size allowed. We found model performance decreased after label swapping compared to original training labels (revised Figure 6—figure supplement 1B). This analysis suggests that while our stringent UMI threshold might result in some false positives, the barcoded versus non-barcoded labeling approach we adopted still offers a reliable foundation for machine learning training data. Additionally, we have clarified that we limited our analysis and conclusions to data derived from barcoded samples. Line 167-170: “ The lower fraction of barcoded versus EGFP+ cells suggests our conservative threshold increases false negatives, classifying some low UMI cells as non-barcoded. Therefore, we focused analyses on reliably barcoded cells, though conclusions may not capture the full heterogeneous projection repertoire.”

We have included the detailed description of threshold determination in the revised manuscript. We also thoroughly discussed the current limitations, computational challenges, potential impact on data analysis, and potential solutions in the revised Discussion (line 484-517).

14) Additional analysis and control experiments related to barcode detection:

The low barcode recovery rate could be due to the low number of copies for AAV-encoded transcripts in the transcriptome of single cells, or it is specific for the detection of short barcode sequences. With the current scRNAseq data, one additional analysis is to measure the percentage of neurons positive for GFP transcripts from both FAC-sorted and non-sorted samples and compare the GFP+ neuron frequency to barcode+ neuron frequency.

We appreciate the reviewer’s suggestion. We set the threshold for scRNA-seq data: if GFP UMI counts (sum of all five barcodes UMI counts) in a certain cell are larger than 0, we will determine this cell as a GFP+ cell, otherwise, it is a GFP- cell. We calculated the GFP+ ratio in FAC-sorted and non-sorted groups (Non-sorted group 1-3, group1: 27%, group2: 25%, group3: 27%; FAC-sorted group 4: 81%). In parallel, we also calculated the barcoded ratio in FAC-sorted and non-sorted groups (Non-sorted group 1-3, group1: 17%, group2: 20%, group3: 19%; FAC-sorted group 4: 49%). We can see that barcoded cells ratio is relatively lower than EGFP+ cells. The lower fraction of barcoded versus EGFP+ cells suggests our threshold increases false negatives, classifying some low UMI cells as non-barcoded. Therefore, we focused analyses on reliably barcoded cells, though conclusions may not capture the full heterogeneous projection repertoire.

We added the GFP+ ratio and barcoded ratio across FAC-sorted and non-sorted groups into the revised Figure S1G and S1H, we also discussed this potential limitation in the Discussion section (line 496-517).

15) To enrich cDNA fragments composed of the barcode index, unique molecular identifiers (UMIs), and the barcode, the authors prepared expressed virus barcode libraries with special primers. The authors also tried to detect barcodes directly in single-cell transcriptional libraries. What was the barcode detection frequency without this additional amplification, that is, in regular single-cell transcriptional libraries? It would be good to comment on how much this approach (we assume) improves barcode detection compared to the regular 10x single-cell libraries without additional barcode amplification.

The projectome and scRNA-seq libraries have different structures due to their distinct amplification templates. The projectome library uses full-length post-amplified cDNA, while the scRNA-seq library uses fragmented cDNA. A key advantage of PCR-enriching the EGFP-barcode region from full-length cDNA is that it generates an organized projectome library structure. Based on our primer design, the expected format is: cell barcode in Read 1 (bases 1-16), UMI in Read 1 (bases 17-28), and projection barcode in Read 2 (bases 31-45). In contrast, in the scRNA-seq libraries, the randomly fragmented cDNA produces variable read lengths covering the 15 bp projection barcode, decreasing detection sensitivity and accuracy. Therefore, a direct comparison of projectome and scRNA-seq libraries for barcode detection may not be feasible.

We have calculated the reads that are mapped to the five barcodes, either based on scRNA-seq libraries fastq file using cellranger or based on projectome libraries using our methods adapted from MULTI-seq (McGinnis et al., 2019, Nature Methods). We found that scRNA-seq libraries contain lots of zero counts for projection barcode (Note that we used UMI+1 for plot) (revised FigureS1D).

16) The authors hypothesized that the non-neuronal cells would not be transduced by rAAV2-retro. The barcode counts in these non-neuronal cells were used to generate the thresholds for projection neurons. However, such cells, especially microglia, could be positive for AAV transcripts perhaps by phagocytosing dying infected neurons. A better control cell population could be cells negative for GFP after FACS. If sequencing data are available for such GFP-negative cells, it would be useful to examine the detection of barcodes in these cells and use them as the negative control for thresholding.

We appreciate the rigorous suggestions of the reviewer. We agree with the reviewer that some of the non-neuronal cells could be “barcoded”. In the revised Figure 1F, we have shown that, non-neurons such as endothelial cells, oligodendrocyte progenitor cells and oligodendrocytes contain several cells that have been determined as barcoded. In the revised projection barcode threshold analysis, we also used “EGFP-negative” cells as determined by scRNA-seq data (see revised Methods part, or response to comment 13 for detail). Unfortunately, we did not collect and sequence the GFP-negative cells after FACS. We also agreed that EGFP-negative cells by FACS are an alternative good negative control cells. We have included this point in the Discussion part of the revised manuscript (line 496-517).

17) In the section on Projection barcode FASTQ alignment, the authors stated that deMULTIplex R package (v1.0.2) (https://github.com/chris-mcginnis-ucsf/MULTI-seq) was used to count UMIs associated with barcodes. This method was designed to detect the Sample index and needs to be further adjusted for barcode reading. The current method did not reveal the fact that a single UMI could be associated with multiple barcodes, which could raise the need for thresholding at this stage.

MULTIseq.preProcess was used to identify the barcode sequence based on its position relative to the P7 primer. The result is a readTable with each row showing the Cell ID, UMI, and barcode sequence. The same UMI appeared in multiple rows and could have different barcodes.

MULTIseq.align function was used to match the barcode sequence in each row of readTable to the 5 barcodes, and to find the numbers of UMI associated with each barcode in each sample. This function utilizes a minimal Hamming distance of 1 to call a match between barcodes detected in the sequencing samples and the list of designed barcodes. We would suggest a minimal hamming distance of 2. Many of the sequences with a minimal Hamming distance of 2 have a frameshift of 1 nucleotide as compared to the designed barcodes. If the parameter of MULTIseq.preProcess is adjusted to change the position of the expected barcode, we would expect to find the full-length barcode. A specific example is:

"AAGGCACAGACTTTG" has a Hamming distance of 2 as compared to barcode 2 "GAAGGCACAGACTTT", and should also be considered as a match.

More importantly, MULTIseq.align does not consider the complication that multiple barcodes could be detected for the same UMI, and simply uses the barcode of the first duplicated UMI. Taking the FAC-sorted pfc_4 dataset as an example, the readTable generated using this dataset contains 30272582 rows. The third column represents sequences detected at the specified barcode position.

After aligning to the 5 barcodes with a max hamming distance of two, 26443131 of the sequences detected at the specified barcode position can be matched, leading to 474012 unique combinations of Cell/UMI, each combination with a set of detected barcodes.

Many of the UMIs are associated with multiple barcodes. In the pfc_4 dataset, 68955 of the 474012 unique combinations of Cell/UMI are associated with multiple barcodes (14.5%).

When we used the data above to compare the density distributions of barcode counts per UMI per cell for the pfc_4 dataset and that of non-neuronal cells, we find that low barcode counts may not be specific. The best negative control here would be to use GFP-negative cells after FACS.

Depending on the threshold values to eliminate these false barcode counts, we reached an even smaller number of barcoded neurons at the end of the analysis.

We thank the reviewer for the careful examination, insightful comments and suggestions regarding the alignment of projection barcode FASTQ in our methods section. We have carefully reviewed and subsequently modified our methodology to address the concerns raised.

First, we have revised the MULTIseq.align function to address the issue of multiple barcodes being detected for the same UMI. We now include a more robust analysis of duplicated UMIs to ensure that all potential barcode matches are considered, rather than defaulting to the first duplicate. Second, we have also adopted a minimal Hamming distance of 2 for the MULTIseq.align function to improve the matching accuracy between detected and designed barcodes. We agree with reviewer that GFP-negative cells after FACS could be an alternatively good control (see response to comment 16). We have now updated our analysis using our improved projection barcode alignment (see Code availability in the revised manuscript) and stringent UMI threshold (see revised Methods part or response to comment 13) to determine barcoded/non-barcoded neurons.

[Editors’ note: what follows is the authors’ response to the second round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

The reviewers have outlined a couple of additional changes that are needed to fully respond to the prior critiques. These represent small changes in the text and, thus, should not present a significant difficulty. Please address these comments in a revised manuscript.

Reviewer #2 (Recommendations for the authors):

The revised manuscript has improved. However, the authors still did not address my concern about determining "dedicated" vs. "collateralized" projections. The authors put those numbers in the Introduction and Results without doing the control experiment I suggested (to determine retrograde labeling efficiency) and without mentioning the caveats. The caveat is only mentioned in Discussion (lines 499-500). Readers who missed this sentence will be misled.

We appreciate your critique on the discussion of the caveats of MERGE-seq. We have now explicitly mentioned the caveats in the Introduction and Results section.

At the end of Introduction, line 104-105:

We revised “Approximately 65% of barcoded vmPFC neurons exhibited dedicated projection patterns” to “Approximately 65% of barcoded vmPFC neurons exhibited dedicated projection patterns based on MERGE-seq data”, clarifying that the conclusion is drawn within the context of MERGE-seq data.

In the Results section, line 258-262: We added “It is worth mentioning that the definition of “dedicated” and “collateral” projections relies solely on the analysis of MERGE-seq data. The quantitative resolution of dedicated and collateral projections of vmPFC neurons will depend on the comprehensiveness of retrograde labeling from all postsynaptic targets and labeling efficiency. ”

Adding a comparison between MERGE-seq and fMOST tracing (the new Figure 3C) is an improvement. As can be seen from the graph, there is a higher percentage of collateralized axons in fMOST compared to MERGE-seq in multiple categories, confirming that MERGE-seq underestimates the fraction of collateralized axons (though to my relief there is not an order-of-magnitude difference).

We appreciate your recognition of our analysis of fMOST data.

The authors should add the number of neurons included in the fMOST dataset in the Figure 3C legend.

We have added the number of neurons in the Figure 3C.

Reviewer #4 (Recommendations for the authors):

This manuscript introduces MERGE-seq, a multiplexed method for profiling transcriptional features of individual neurons projecting to specific targets. The approach involves multiplexed retrograde tracing by injecting distinctly barcoded rAAV-retro viruses into different target areas, followed by scRNAseq of neurons in the source area on the 10xGenomics platform. The projection targets of barcoded neurons in the source area can be inferred by matching the detected barcodes to the barcode sequences to of rAAV-retro viruses injected into the target areas.

Validation of this approach was conducted by injecting rAAVs carrying five distinct 15-nt barcodes to five known ventromedial prefrontal cortex (vmPFC) targets. This revised version has performed integration analysis with previously existing vmPFC scRNA-seq and MERFISH dataset, and compared vmPFC scRNA clusters and the 7 excitatory neuron subtypes analyzed in this study with those in prior datasets. MERGEseq facilitated the identification of vmPFC cell types projecting to distinct areas, revealing that each of the seven identified excitatory neuron subtypes projects to multiple targets, and the five targets receive projections from multiple transcriptomic types. MERGE-seq derived projection patterns were validated through dual-color retro-AAV tracing and were correlated successfully with fMOST-based single-neuron tracing data. Additionally, marker genes for projection-specific cell subclasses were validated in retrogradely labeled vmPFC using RNA FISH for marker detection.

This revised version has effectively tackled the previously raised concerns. Significant efforts have been dedicated to performing an integrated analysis with existing datasets, enhancing the data analysis methodology, and imposing more stringent criteria for barcode determination. The revised manuscript places greater emphasis on acknowledging and incorporating several prior approaches that influenced the development of the MERGE-seq concept. While the efficiency of retrograde barcoding wasn't experimentally addressed by injecting rAAV-retro viruses with different barcodes into the same region, the limitations and potential concerns of MERGE-seq are now explicitly discussed. Additionally, the revised manuscript provides clarity on essential technical aspects, including QC criteria and parameters for evaluating scRNA data quality. In sum, this manuscript is rigorous and thorough, offering a valuable approach for the multiplexed investigation of neuronal transcriptomics and projection targets.

We appreciate your positive assessment of MERGE-seq and recognition of our revised manuscript.

In addition, I suggest that QC criteria should be explicitly listed in the main text. The number of cells passing each QC step should also be listed either in the main text or in the related figures. My understanding is that there is a general QC step for scRNAseq quality based on gene count, total UMI count, and mitochondrial gene expression and that there is another step to identify low-quality cells and contaminated non-neuron cells. It would be very helpful that such information is readily available in the main text.

We appreciate your suggestions on specifically indicating QC criteria in the main text. In the revised manuscript, we have made the following changes:

For the general QC, in the Line 137-142:

We added “We detected 24,788 cells in the raw data matrix. Following initial quality control, which ensured the number of detected RNA in each cell ranged between 500 and 8000, RNA UMI counts in each cell were within 1000 to 60000, and the percentage of mitochondrial genes remained below 20%, we recovered 1791 cells undergoing fluorescence-activated cell sorting (FACS) from three mice and 19,470 single cells without sorting from the other three mice, a total of 21,261 cells.”

For the QC of low-quality cells and contaminated non-neuron cells in the excitatory neurons, in the Line 187-195:

We added “We first re-clustered excitatory projection neurons expressing Slc17a7 (also known as vesicular glutamate transporter, Vglut1). Clusters with low gene/UMI counts and high mitochondrial gene expression were filtered out as low-quality (Ilicic et al., 2016). Some clusters exhibited non-neuronal cell markers like microglial genes (C1qa, C1qb), oligodendrocyte genes (Olig1, Olig2), and endothelial cell genes (Flt1, Cldn5) despite small cluster size, indicating contamination from other cell types incorrectly grouped within excitatory neurons after initial clustering. In total, we filtered out 637 cells that were identified as either low-quality or contaminated with non-neuronal cell types and recovered 9368 excitatory neurons (see Methods, Figure 2—figure supplement 1A, Supplementary file 1).”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Xu P, Peng J, Yuan T, Chen Z, Wu Z, Luo ZG, Chen Y, Li CT. 2024. High-throughput mapping of single-neuron projection and molecular features by retrograde barcoded labeling. NCBI Gene Expression Omnibus. GSE210174 [DOI] [PMC free article] [PubMed]
    2. Peibo X. 2022. figure1&S1. figshare. [DOI]
    3. Peibo X. 2022. figure5&S5. figshare. [DOI]
    4. Peibo X. 2022. figure4&S4. figshare. [DOI]
    5. Peibo X. 2022. figure6&S6. figshare. [DOI]
    6. Peibo X. 2022. figure2&S2. figshare. [DOI]
    7. Peibo X. 2022. figure3&S3. figshare. [DOI]
    8. Peibo X. 2022. Fig4_CDEFGH_confocal. figshare. [DOI]
    9. Peibo X. 2022. Fig4E&H_Fig5F_FigS4D_IHC_quant_data. figshare. [DOI]
    10. Bhattacherjee A, Djekidel MN, Chen R, Chen W, Tuesta LM, Zhang Y. 2019. Cell type-specific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. NCBI Gene Expression Omnibus. GSE124952 [DOI] [PMC free article] [PubMed]
    11. Lui JH, Luo L. 2020. Single cell RNAseq of Rbp4cre+ neurons from prefrontal cortex. NCBI Gene Expression Omnibus. GSE161936

    Supplementary Materials

    Figure 3—source data 1. Related to Figure 3C, D, F and G.

    Files contain raw data for Figure 3C and D, statistical summary for Figure 3D, raw data for Figure 3F and G.

    Figure 4—source data 1. Related to Figure 4A, Figure 4—figure supplement 1.

    Files contain the raw differentially differentially expressed genes (DEGs) list used for plotting volcano plots. For each target, DEGs were calculated using target-barcoded cells against non-target-barcoded cells (e.g., AI barcoded versus non-AI-barcoded, DMS barcoded versus non-DMS-barcoded, etc.). MAST algorithm was used to do DE testing.

    elife-85419-fig4-data1.xlsx (412.6KB, xlsx)
    Figure 4—source data 2. Related to Figure 4B and C.

    Files contain the raw quantitative data of immunostaining results.

    Figure 5—source data 1. Related to Figure 5B.

    Files contain the raw differentially differentially expressed genes (DEGs) list used for plotting volcano plots. DEGs were calculated using DMS-barcoded cells against DMS + LH-barcoded cells. MAST algorithm was used to do DE testing.

    elife-85419-fig5-data1.xlsx (215.9KB, xlsx)
    Figure 5—source data 2. Related to Figure 5D–G.

    Files contain the raw quantitative data of immunostaining results.

    Figure 6—source data 1. Related to Figure 6B, Figure 6—figure supplement 1B.

    Files contain raw data and statistical summary, raw data and for Figure 6—figure supplement 1B.

    elife-85419-fig6-data1.xlsx (313.5KB, xlsx)
    Supplementary file 1. Marker genes of 2nd round clustering of “Excitatory neuron” cluster, related to Figure 2.

    The inserted umap shows the number of UMI counts (nCount_RNA) per cluster, number of genes (nFeature_RNA) per cluster, percentage of mitochondria genes (percent_mt) per cluster.

    elife-85419-supp1.xlsx (426.6KB, xlsx)
    Supplementary file 2. Marker genes of 7 scRNA-seq clusters from all excitatory neurons, related to Figure 2.

    Table with marker genes for each cluster calculated using Seurat package using Wilcoxon test.

    elife-85419-supp2.csv (962KB, csv)
    Supplementary file 3. Marker genes of 8 clusters from all cells, related to Figure 1.

    Table with marker genes for each cluster calculated using Seurat package using Wilcoxon test.

    elife-85419-supp3.csv (677.6KB, csv)
    Supplementary file 4. Median gene detection metrics for different major cell types, related to Figure 1.
    elife-85419-supp4.csv (225B, csv)
    MDAR checklist

    Data Availability Statement

    Raw gene expression, barcode count matrices and metadata are available from the Gene Expression Omnibus (GSE210174). The computational code used in the study is available at GitHub (https://github.com/MichaelPeibo/MERGE-seq-analysis copy archived at Peibo, 2024). The data needed to evaluate the conclusions in the paper can be downloaded at https://figshare.com/projects/High-throughput_mapping_of_single-neuron_projection_and_molecular_features_by_retrograde_barcoded_labeling/150207. All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials and source data files.

    The following datasets were generated:

    Xu P, Peng J, Yuan T, Chen Z, Wu Z, Luo ZG, Chen Y, Li CT. 2024. High-throughput mapping of single-neuron projection and molecular features by retrograde barcoded labeling. NCBI Gene Expression Omnibus. GSE210174

    Peibo X. 2022. figure1&S1. figshare.

    Peibo X. 2022. figure5&S5. figshare.

    Peibo X. 2022. figure4&S4. figshare.

    Peibo X. 2022. figure6&S6. figshare.

    Peibo X. 2022. figure2&S2. figshare.

    Peibo X. 2022. figure3&S3. figshare.

    Peibo X. 2022. Fig4_CDEFGH_confocal. figshare.

    Peibo X. 2022. Fig4E&H_Fig5F_FigS4D_IHC_quant_data. figshare.

    The following previously published datasets were used:

    Bhattacherjee A, Djekidel MN, Chen R, Chen W, Tuesta LM, Zhang Y. 2019. Cell type-specific transcriptional programs in mouse prefrontal cortex during adolescence and addiction. NCBI Gene Expression Omnibus. GSE124952

    Lui JH, Luo L. 2020. Single cell RNAseq of Rbp4cre+ neurons from prefrontal cortex. NCBI Gene Expression Omnibus. GSE161936


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES