Abstract
Understanding the molecular logic of cortical cell-type diversity can illuminate cortical circuit function and evolution. Here, we performed single-nucleus transcriptome and chromatin accessibility analyses to compare neurons across three- to six-layered cortical areas of adult mice and across tetrapod species. We found that, in contrast to the six-layered neocortex, glutamatergic neurons of the three-layered mouse olfactory (piriform) cortex displayed continuous rather than discrete variation in transcriptomic profiles. Subsets of piriform and neocortical glutamatergic cells with conserved transcriptomic profiles were distinguished by distinct, area-specific epigenetic states. Furthermore, we identified a prominent population of immature neurons in piriform cortex and observed that, in contrast to the neocortex, piriform cortex exhibited divergence between glutamatergic cells in lab versus wild-derived mice. Finally, we showed that piriform neurons displayed greater transcriptomic similarity to cortical neurons of turtles, lizards, and salamanders than to those of the neocortex. In summary, despite over 200 million years of co-evolution alongside the neocortex, olfactory cortex neurons retain molecular signatures of ancestral cortical identity.
Introduction
Sensory systems evolved to inform behavioural adaptations to diverse ecological environments1–3. For example, during the transition from aquatic to terrestrial life, animals adapted their olfactory circuits to enhance their ability to find food, detect predators, and locate mates4–6. Olfactory areas are thus thought to have dominated the pallium (cortex) of early vertebrates7–9. Over time, the pallium underwent a remarkable diversification in its cellular and circuit organization to accommodate the increasing demands for processing a wide array of sensory stimuli2,7. A criterion historically used to infer the evolution of the pallium is its cytoarchitecture, defining the six-layered neocortex as a novel trait in mammals that emerged approximately 200 million years ago (MYA)10. In contrast, one- to three-layered structures are considered ancestral cortical traits, which date back at least to the last common ancestor of tetrapods, 350 MYA4. Models of brain evolution propose that the mammalian neocortex emerged through the expansion of an ancestral dorsal pallium, with cell biological innovations arising from novel combinations and modifications of a shared genetic toolkit7,11,12.
Intriguingly, the three-layered cytoarchitecture of the mammalian olfactory cortex resembles the pallium of non-mammalian vertebrates, including reptiles and amphibians. This structurally conserved trait has been referred to as paleocortex, despite major differences in sensory-motor functions and developmental origin13–15. Thus, the mammalian olfactory cortex provides a unique point of comparison, with its cytoarchitecture conserved across species while its neurons evolved in parallel with the emergence of the neocortex (Fig. 1a).
Fig. 1: Transcriptomic diversity of piriform cortex glutamatergic neurons.

(a) Two- to three-layered cytoarchitecture is observed throughout the pallium of amphibians and reptiles, and in the olfactory (piriform) cortex of mammals. Four- to six-layered cytoarchitecture is only present in mammals. (b) Overview of the workflow: microdissection of anterior and posterior piriform cortex (aPir and pPir), agranular insular cortex (AI), and primary somatosensory cortex (SSp) of adult mice; single-nucleus multiome sequencing (sn-multiome seq); unsupervised clustering of all neurons into supertypes. (c) UMAP of combined aPir and pPir neurons grouped by glutamatergic (GLUT) and GABAergic (GABA) neuron types. Inhibitory neurons from caudal and medial ganglionic eminence (IN_CGE and IN_MGE). (d) Immunohistochemistry of representative markers for piriform glutamatergic neuron types. Scale bar, 100 μm. (e) UMAP of mature glutamatergic neuron subtypes from combined aPir and pPir data sets. (f) Gene expression levels of laminar- and subtype-specific marker genes for piriform glutamatergic neuron subtypes. (g) Immunohistochemistry using Vglut1-CRE/INTACT-GFP transgenic mice of representative piriform markers (magenta arrowheads). Higher magnification inserts show co-expression of markers with VGLUT1. Scale bar, 100 μm. (h) Top: scheme of cell type gradient across cortical depth from SL (in orange) to Pyr (in blue) cells, together with UMAP of piriform glutamatergic neurons grouped by layers. Bottom: UMAPs showing gene expression of markers distributed across cortical depth, and respective in situ hybridization (ISH) images from the Allen Brain Atlas. Scale bar, 100 μm. (i) Top: scheme of cell type gradient across anterior-posterior axis, together with UMAP color-coded by aPir and pPir data sets. Middle: fraction of aPir and pPir neurons for each glutamatergic subtype. Bottom: immunohistochemistry in horizontal sections of Vglut1-CRE/INTACT-GFP transgenic mice shows expression of anterior- or posterior-specific markers (magenta). Inserts show markers without VGLUT1 staining. LOT: lateral olfactory tract; AON: anterior olfactory nucleus; LEC: lateral entorhinal cortex. Scale bar, 100 μm.
Here, using single-cell genomics, we test the premise that, despite millions of years of co-evolution alongside the neocortex, olfactory cortex neurons retained molecular signatures of ancestral cell type identity in the mammalian brain. Our findings reveal key ancestral features of olfactory cortex neurons compared to those of the neocortex, including pronounced graded transcriptomic profiles, a greater overlap in gene co-expression, transcriptomic divergence between lab and wild-derived mice, and greater transcriptomic similarity to cortical glutamatergic neurons of non-mammals.
Results
Transcriptomic diversity of piriform cortex neurons
We performed single-nucleus multiome sequencing (combined RNA and ATAC) of a total of 10,013 nuclei from four micro-dissected cortical areas of adult mice: the three-layered anterior and posterior olfactory (piriform, aPir and pPir) cortex, the four-layered agranular insular cortex (AI), and the six-layered primary somatosensory cortex (SSp) (Fig. 1b and Extended Data Fig. 1). After stringent quality control (Extended Data Fig. 2) and identification of neuronal and non-neuronal cell types (Extended Data Fig. 1d–e), we subclustered only the neurons, representing a core data set across the four cortical areas.
Piriform cortex (Pir) neurons were grouped into types and subtypes. Neuron types included two types of inhibitory neurons (INs), namely caudal and medial ganglionic eminence-derived INs (CGE and MGE respectively), three types of mature glutamatergic neurons, namely semilunar (SL), pyramidal (Pyr), and Vglut2-expressing (Vglut2) neurons, and one type of immature glutamatergic neurons (Immature) (Fig. 1c). Mature glutamatergic neurons were further divided into two subtypes of SL cells, 12 subtypes of Pyr cells, and one subtype of Vglut2 neurons (Fig. 1e). We identified marker genes specific to types and subtypes, whose expression patterns were validated using immunohistochemistry and RNA in situ hybridization data from the Allen Brain Atlas (Fig. 1f–g and Extended Data Fig. 3).
We also identified spatially orthogonal axes of variation of piriform Vglut1-expressing neurons (SL and Pyr cells) across cortical depth and the anterior-posterior axis (Fig. 1h–i). Vglut1 neuron subtypes were distributed along cortical depth, with marker genes spanning this laminar gradient (Fig. 1h). Along the anterior-posterior axis of piriform cortex, subtypes SL1 and Pyr2 were enriched in aPir, while subtypes SL2 and Pyr 8–9-11 in pPir (Fig. 1i). Immunohistochemistry confirmed anterior-posterior gradients: SL1-specific RORB and Pyr2-specific CARTPT expression were higher in anterior, while RELN expression, enriched in SL2 cells, was higher in pPir (Fig. 1f, i).
Neuron diversity across three- to six-layered cortical continuum
Next, we compared piriform cortex neurons with neurons from other mouse cortical areas. As an initial reference, we used gene expression and canonical correlation analysis16 to integrate our data set with a single-cell reference atlas that included neocortical areas (NCx), transition areas, such as AI and lateral entorhinal cortex (LEC), and the hippocampal formation17 (Fig. 2a and Extended Data Fig. 4a–b). Piriform INs co-clustered with INs from all cortical areas (Fig. 2b). In contrast, piriform glutamatergic neurons co-clustered with 83% of neurons from transition and hippocampal areas, and with only 8% of neurons from neocortex (Fig. 2b and Methods). Specifically, piriform SL cells co-clustered with neocortex layer 4 (Pir SL1) and with AI and LEC layer 2a neurons (Pir SL2) (Fig. 2b and Extended Data Fig. 4c). Piriform Pyr cells co-clustered with neocortex layer 2/3 intratelencephalic (IT) projection neurons (Pir deep layers, Pyr 8–9), with retrohippocampal, AI, and LEC layer 2/3 IT neurons (Pir all layers, Pyr 1–2-3–4-6–12), and with glutamatergic neurons of the dentate gyrus (DG) (Pir deep layers, Pyr 7–8-9) (Fig. 2b and Extended Data Fig. 4c, d). Finally, we identified a set of transcription factors (TFs) that were highly enriched in Pir, including Ebf1, Ebf2, Tfap2d, Dach1, Dach2, Glis3 and Rarb (Fig. 2c, e, and Extended Data Fig. 4e).
Fig. 2: Transcriptomic diversity across the mouse three- to six-layered cortical continuum.

(a) Integration of neurons (n=30,553) from this study and a mouse single-cell reference atlas17 using Canonical Correlation Analysis (CCA). From left to right, visualizations in the same UMAP of i) neurons from piriform and neocortex ii) neurons from piriform and other cortical (transition) areas iii) neurons from piriform and hippocampal formation. (b) Quantification of co-clustering between piriform and neocortex neurons, and representative transition and hippocampal areas (extended quantification is in Extended Data Fig. 4c–d). Colour of the rectangles indicates the percentage of neurons (rows) co-clustering within new integrated clusters (columns). SL: semilunar cell; Pyr: Pyramidal cell; IN: inhibitory cell; L: layer; NCx: neocortex; IT: intratelencephalic; PT: pyramidal tract; NP: near projecting; CT: cortico-thalamic. DG: dentate gyrus; CA 1/2/3: hippocampal fields; ENT: entorhinal (medial and lateral). (c) UMAPs of gene expression for the integrated data sets shown in (a) of generic markers for glutamatergic (Slc17a7 or Vglut1) and inhibitory (INs) (Gad2) neurons for reference, and of representative transcription factors (TFs) highly enriched in piriform cortex compared to other cortical areas (Ebf1, Ebf2, Glis3, Tfap2d, Dach1, Dach2, others in Extended Data Fig. 4e). (d) Partition-based graph abstraction (PAGA) plot showing supertypes from piriform (aPir and pPir are combined), AI, and SSp. Each supertype (node) is represented as a pie chart showing the relative contribution of the cortical areas. Supertype labels are based on piriform taxonomy, accompanied by the respective SSp labels based on17 (see Extended Data Fig. 4f–g). (e) Immunohistochemistry for the TF EBF2 identifies supertype Pyr11 as a glutamatergic neuron subtype unique to piriform layer 3.
We next performed more detailed analyses on neurons of our aPir, pPir, AI, and SSp multiome data sets (Fig. 1b), identifying 27 transcriptomically-defined clusters across cortical areas, which we refer to as supertypes (Fig. 2d). Supertypes, which are based on piriform cortex cell-type taxonomy, provide a consistent relative reference for comparing gene expression (Fig. 2d) and chromatin accessibility states (Fig. 3a) across cortices. Furthermore, we inferred supertype projection neuron profiles by integrating SSp data sets from this study and the single-cell reference atlas17 (Fig. 2d and Extended Data Fig. 4f–g). Supertype SL1, characterized by high expression levels of the thalamorecipient-specific transcription factor (TF) Rorb, comprised (anterior) piriform SL cells together with 7% of SSp layer 4 IT neurons, potentially reflecting a shared molecular signature of sensory input neurons in aPir and SSp (Fig. 2d and Extended Data Fig. 4i). In contrast, supertype SL2, selectively expressing the transcriptional regulator Tcerg1l, was composed of only piriform SL cells (Figs. 1f and 2d). 90% of Pyr neurons co-clustered across cortical areas as IT projection neurons. Specifically, Pir neurons of upper layers 2a/2b co-clustered with SSp neurons of deep layers 4/5, Pir neurons of layers 2b/3 co-clustered with SSp neurons of upper layers 2/3, and Pir neurons of deep layer 3 co-clustered with SSp neurons of deep layer 6 (Fig. 2d). In contrast, supertype Pyr11, selectively expressing the TF Ebf2, was piriform-specific (Figs. 1f, g, 2c–e, and Extended Data Fig. 3b, c). SSp-specific supertypes included deep layers 4/5 IT neurons (Pyr 14–15), deep layer 5 extra-telencephalic (ET) projection neurons (Pyr16, Fezf2+), and deep layer 6 corticothalamic (CT) projection neurons (Pyr17, Foxp2+) (Fig. 2d and Extended Data Fig. 4i).
Fig. 3: Area-specific epigenetic states distinguish transcriptomically similar glutamatergic neurons across the adult mouse three- to six-layered cortical continuum.

(a) PAGA plot with RNA-defined supertypes projected onto ATAC-defined clusters (nodes). Dotted circles represent main neuron type. (b) To investigate the misalignment between transcriptome and epigenome in glutamatergic neurons, (i) only supertypes in which glutamatergic neurons co-cluster across cortical areas (matched supertypes) and (ii) only e-regulons shared across cortical areas are considered. Bottom: e-regulon is composed of a TF, enhancer regions the TF binds to (e), and the target genes the TF regulates (g). (c) Kernel-density estimate plots showing the fraction of target genes (top) and enhancers (bottom) of 35 shared e-regulons. Left, right: fractions specific to single area, shared across areas. (d) Upset plot of target genes (black bars) and target enhancers (gray bars) for the e-regulon Rorb(+) across aPir, pPir, AI and SSp. Vertical bars show number of target genes and enhancers in the corresponding intersection indicated with connected black dots. Horizontal bars show total number of target genes and enhancers for each cortical area. (e) Example of TF combinations for each cortical area. Kcnk2 is predicted a target gene of Rorb in all four cortical areas. (f) Quantification of cell type discreteness (cluster distance). First two panels show distances based on TFs only, between transcriptomically similar neurons across all layers (left), and in similar layers (right). Second two panels similarly show distances, but based on the entire transcriptome. (g) TF co-expression per cortical area. Top: spearman correlations of the 3528 (0.1%) most positive and negative TF pairs. Bottom: relative TF co-expression, with correlations z-scored per TF pair. Box-and-whisker plots show center (median) as a white dot, 25th and 75th percentile box bounds, and whiskers extending to 1.5*inter-quartile range. (h) Approximate percentage of predicted repressive interactions based on bootstrapped ‘leave-10%-out’ ATAC data (n=100). Vertical dotted lines indicate percentage predicted repression of area-specific e-GRNs. (i) Predicted repressive interactions between TFs. Interactions are visualized as networks laid out along cortical depth, with TFs (nodes) placed in the center of their laminar expression domain. Node size indicates the number of repressor-specific target genes.
Together, we identify piriform cortex-specific glutamatergic neurons as well as piriform glutamatergic neurons with conserved molecular profiles across three- to six-layered cortical areas. These conserved profiles correspond to intratelencephalic projection neurons. In contrast, piriform neurons do not share molecular profiles with neocortex extra-telencephalic projection neurons.
Transcriptomically similar cortical neurons differ in their epigenome
The transcriptomic similarities of subsets of Pir and SSp glutamatergic neurons suggest that their chromatin accessibility states may similarly be conserved. However, unsupervised clustering of the ATAC sequencing data revealed an unexpected misalignment between transcriptome- and epigenome-based supertypes across cortical areas. The greatest divergence was observed between Pir and SSp glutamatergic neurons, with AI neurons exhibiting an intermediate epigenetic profile. In contrast, transcriptomically-defined INs aligned well with ATAC-based IN clusters (Fig. 3a and Extended Data Fig. 5a–d). To understand the misalignment between the transcriptome and chromatin accessibility data, we inferred enhancer-driven gene regulatory networks (e-GRNs) using SCENIC+, a prediction framework that combines RNA and ATAC data18. An e-GRN is composed of e-regulons, which in turn consist of TFs, their target enhancers, and their downstream target genes (Fig. 3b). We computed e-GRNs for aPir, pPir, AI, and SSp neurons, resulting in 161, 150, 135, and 124 high-quality e-regulons, respectively (Extended Data Fig. 5e). We then focused on the 35 conserved e-regulons, identified in each of the four cortical areas (Supplementary Fig. 2). These e-regulons exhibited specificity either for a particular supertype, such as Rorb (SL1), for a particular layer, such as Glis3 (SSp L2/3, Pir L2b/3) and Rfx3 (SSp L2/3, Pir L2b/3), or were active in all neurons, such as Mef2c, Tcf4, and Zpf148. Remarkably, in each of the 35 e-regulons, TFs had highly area-specific target enhancers and target genes (Fig. 3c and Supplementary Fig. 2). For example, the TF Rorb, specific to the supertype SL1, shared only 7% of its target enhancers and 9% of its target genes between aPir and SSp neurons (Fig. 3d and Extended Data Fig. 5f, g). Target genes identified for one cortical area were still expressed in the other areas, contributing to their overall transcriptomic similarity (Extended Data Fig. 5h–k). However, target genes in the other areas were regulated by overlapping yet distinct TF combinations (Supplementary Fig. 1c). Examples of area-specific TF combinations for the potassium channel Kcnk2, a Rorb-specific target gene shared across cortical areas, are shown in (Fig. 3e). The differences for glutamatergic neurons are unlikely solely due to the sparse nature of ATAC data, as the transcriptomic and epigenomic profiles of INs aligned well across cortical areas.
Together, our analysis reveals that glutamatergic neurons, despite their transcriptomic similarities across three- to six-layered cortical areas, exhibit highly area-specific epigenetic states and the differential use of TF combinations.
Opposing TF co-expression trends in piriform and neocortex
A prominent feature observed in the clustering of Pir glutamatergic neurons was the graded changes in transcriptomic profiles (Fig. 1h–i). We therefore computed cluster discreteness for each cortical area by measuring the minimum distance between transcriptomically-defined supertypes containing glutamatergic neurons from all areas (matched supertypes). Shorter distances reflect higher overlap in gene expression between neuronal clusters. This analysis was conducted using the entire transcriptome or only TFs, and applied to all cortical layers, or selectively to transcriptomically similar layers. We found that aPir and pPir clusters exhibited significantly greater transcriptomic overlap among each other compared to SSp clusters, with AI showing an intermediate profile (Fig. 3f and Supplementary Fig. 3a–b). This result was unaffected by the inclusion of other neocortical areas from the single-cell reference atlas17 (Extended Data Fig. 6a). Interestingly, dentate gyrus (DG) showed a cell type discreteness profile more similar to AI than to aPir or pPir (Extended Data Fig. 6a).
High overlap in gene expression may result from limited cross-inhibitory interactions between TFs, consistent with the observed co-expression of key TFs for cell type identity in piriform but not in neocortex19–21. We thus tested the premise that a higher degree of TF co-expression is a key feature of piriform glutamatergic neuron diversity. We first quantified co-expression of TFs across cortical areas using RNA-seq data and relative correlations. TF co-expression analysis showed higher TF co-expression in aPir and pPir compared to SSp, with AI displaying an intermediate profile (Fig. 3g, Extended Data Fig. 6b, and Methods). Additionally, SSp, VIS, MOp, and DG from the single-cell reference atlas17 exhibited a similar TF co-expression profile to our SSp data set (Extended Data Fig. 6b).
As an independent method, we quantified activating and repressive TF interactions in area-specific e-GRNs by considering only matched supertypes from aPir, pPir, and SSp. Repression here is defined as anti-correlated expression between a TF and its target genes, while associated enhancers exhibit accessible chromatin states18. We found that the percentage of transcriptional repression was lower in glutamatergic neurons of aPir and pPir compared to those of SSp (aPir: 3.96%; pPir: 5.77%; SSp: 8.26%) (Fig. 3h–i and Extended Data Fig. 6c–d). Notably, aPir and pPir neurons exhibited differences in these interactions, despite similar expression of TF repressors (Fig. 3i and Extended Data Fig. 6e).
Together, these findings support a model in which pronounced graded molecular identities and a higher degree of TF co-expression are key features of piriform glutamatergic neuron diversity, while increased repression in the neocortex may be associated with the emergence of more discrete molecular identities.
Molecular profiles of immature neurons in posterior piriform cortex
Transcriptomic analysis identified a population of adult immature neurons, enriched in pPir (Fig. 4a). Immunohistochemistry for DCX, a widely used marker for immature neurons22, revealed an anterior-posterior gradient of immature cells, with the highest levels in pPir (Fig. 4b). These cells exhibited a glutamatergic phenotype (Tbr1+, predominantly Vglut2) and expressed canonical markers of immature neurons such as Sox11 and Sox423, as well as specific markers such as St8sia2, Igfbpl1, and Zfp57 (Fig. 4c and Extended Data Fig. 3f). We also observed SSp neurons in the immature neuron supertype. However, we do not consider these neurons as immature (Extended Data Fig. 4g–h).
Fig. 4: A potential link between immature neurons and the transcriptomic divergence of pyramidal cells between lab and wild-derived mice in posterior piriform cortex.

(a) Fraction of immature neuron supertype across aPir, pPir, AI, and SSp relative to all supertypes. (b) Immunohistochemistry in horizontal slices showing expression of DCX (magenta) along the anterior-posterior axis of piriform cortex. Inset: DCX expression without DAPI. Scale bar, 100 μm. (c) Left: expression of generic and specific markers for piriform immature neurons. Right: immunohistochemistry in pPir showing co-expression of DCX (magenta) and ST8SIA2 (cyan), an immature neuron-specific marker. Scale bar, 50 μm. (d) Percentage of immature neurons predicted as other neuron types (left) or subtypes (right) using RNA and ATAC data of pPir (100 classifiers predicted each of 151 immature neurons). Data are shown as mean values +/− bootstrapped 95% confidence intervals (CIs). (e) Mean accessibility of top 5000 peaks that distinguished Pyr7 from other Pyr subtypes, plotted for immature neurons, Pyr7, other Pyr subtypes and SL. ‘other Pyr’ are shown as mean +/− bootstrapped 95% CIs. (f) 3D diffusion map showing two trajectories between immature neurons and pyramidal cells computed by GeneTrajectory. Marker genes from (c) are highlighted. (g) Genes from the red rectangle in (f) are visualized in 2D diffusion maps. (h) Workflow: microdissection of aPir, pPir and SSp of adult wild-derived mice, single-nucleus RNA sequencing (sn-RNA seq), and unsupervised clustering of neurons into supertypes. Per cortical area, lab and wild neurons are integrated using optimal transport and their transcriptomes compared. (i) PAGA plot representing supertypes from aPir, pPir, and SSp, with pie charts showing relative contributions of these cortical areas. (j) Expression of laminar- and subtype-specific markers, that distinguish subtypes in lab mice, across wild piriform glutamatergic neurons. (k) Conservation scores per neuron type. Within violin plots, black circles mark medians and bars indicate 95% CIs. Significant p-values of two-sided Mann-Whitney U-Tests are shown in black and grey (see Supplementary Additional Methods and Source Data Table 1 for details and sample size). (l) tSNE of integrated lab and wild pPir data. Top: main neuron types. Bottom: misaligned neurons between lab and wild. (m) Percentage of immature neurons predicted as other neuron types using lab and wild RNA data of pPir (100 classifiers predicted each of 258 immature neurons). Data are shown as mean values +/− bootstrapped 95% CIs.
Previous work showed that Pir immature neurons can differentiate into mature glutamatergic neurons24. To identify the subtypes most closely related to immature neurons, we used a linear classification on combined RNA and ATAC data from the pPir data set. We found that the molecular profile of Pyr cells, specifically of subtype Pyr7, most closely matched immature neurons with 99.52% accuracy (Fig. 4d–e and Extended Data Fig. 7b–c). To validate this finding, we computed the top 5000 chromatin accessibility peaks that distinguish Pyr7 from other Pyr subtypes and compared these peaks to peaks across all immature and differentiated glutamatergic neurons. Chromatin accessibility profiles were most similar between immature neurons and the subtype Pyr7 (Fig. 4e). Immature neurons and Pyr7 exhibited enrichment in pPir when mapped to the whole-brain spatial (MERFISH) reference atlas25 (Extended Data Fig. 7a).
Finally, we applied GeneTrajectory26, an optimal transport-based analysis, to infer sequential gene dynamics between immature and mature neurons of pPir. We identified two gene trajectories: trajectory 1 contained immature neurons, while trajectory 2 contained Pyr neurons (Fig. 4f). We then examined trajectory-specific genes at the transition between the two trajectories. Genes in trajectory 1 were linked to an immature neuron profile and included Wnt7b, Ezh2, Neurod6, Slc1a3, Cntn2, Sema5b, and Tgfbr127–30 (Fig. 4g). Interestingly, at the root of trajectory 2, we observed many genes that play key roles in activity-dependent neuronal differentiation of adult-born granule cells in the dentate gyrus27,31–33. Genes at the root of trajectory 2 included Tshr, Tet1, Bcl2, Nfix, Ndn, Rpl24, Tubb3, Ephb1, Stmn1, Cux2, Rnd2 (Fig. 4g).
Together, we provide a transcriptomic characterization of adult immature neurons in the posterior piriform cortex and identify candidate molecular mechanisms underlying their differentiation into mature pyramidal cells.
Piriform pyramidal neuron divergence between lab and wild mice
Previous studies24,34 showed that adult piriform immature neurons are embryonically-generated and have been hypothesized to enhance neuronal plasticity in brain areas lacking adult neurogenesis35. We therefore tested a potential link between piriform immature neurons and increased variation in neuronal composition as a proxy for plasticity-driven cellular divergence. To address this, we compared Pir and SSp cell types between two mouse strains of the same species (Mus musculus) that have evolved in different environments: inbred lab mice (C57BL/6) and outbred wild-derived mice36,37. We performed single-nucleus multiome and RNA sequencing from micro-dissected aPir, pPir, and SSp of adult wild-derived mice (Fig. 4h and Extended Data Fig. 8a–b). After stringent quality control and identification of neuronal and non-neuronal cell types with good alignment with the lab data set (Extended Data Fig. 8c–g), we subclustered 24,901 neurons into 36 transcriptomically-defined supertypes (Fig. 4i). Piriform neuron subtype classification was largely consistent between lab and wild data sets (Fig. 4i–j).
We then asked to what extent cortical areas of lab and wild-derived mice differed in their cellular components. We quantified, for each cortical area, the correspondence probability for each pair of cells between lab and wild data sets using optimal transport (OT)38. These probabilities were used to co-embed the lab and wild data sets into a shared low-dimensional space (Fig. 4h, Methods and Supplementary Additional Methods). To measure the degree of neuron-neuron similarity, we defined a conservation score, which is a modification of the local inverse Simpson’s index (iLISI)39. A score of 0.5 indicates perfect overlap between lab and wild data sets (Fig. 4h, Methods and Supplementary Additional Methods). When computed across all mature neurons, pPir exhibited the lowest conservation score, while SSp the highest (aPir: 0.422; pPir: 0.408; SSp: 0.472) (Fig. 4k and Extended Data Fig. 7d–e). Within pPir, pyramidal cells were the neuron type to exhibit the lowest conservation score (pPir-SL: 0.433; pPir-Pyr: 0.394; pPir-Vglut2: 0.450; pPir-IN: 0.483). Across pyramidal cells of each area, those of pPir exhibited the lowest conservation score (aPir-Pyr: 0.422; pPir-Pyr: 0.394; SSp-Pyr: 0.472). All above scores were statistically significant for all pairwise comparisons, whereas scores of INs were not (Fig. 4k and Extended Data Fig. 7e). Interestingly, misaligned neurons were distributed across distinct Pyr subtypes rather than forming a transcriptomically-defined cluster (Fig. 4l). This was confirmed by integrating the lab and wild pPir data sets using Single-cell Variational Inference (scVI)40. In scVI latent space, Pyr cells showed the least similarity between data sets (Extended Data Fig. 7f), and OT-identified misaligned Pyr cells from wild mice were less likely to share neighbours with neurons from lab mice (Extended Data Fig. 7g). Together, these results suggest the possibility that, while Pyr7 may have been the actively maturing neuron subtype at the time of analysis, immature neurons may contribute to diverse, sparse pyramidal neuron populations over longer timescales across animals.
To test which glutamatergic neuron type was most closely related to immature neurons independent of the mouse strain, we performed a linear classification analysis using the combined lab and wild pPir data sets (Extended Data Fig. 7h). The classifiers confirmed Pyr cells as the closest cell type to immature neurons with 95.4% accuracy (Fig. 4m). Immunohistochemistry supported this relatedness by showing co-expression of DCX with CUX1, a marker for Pyr cells, in both lab and wild-derived mice, but not with RELN and GABA, markers for SL cells and INs, respectively (Extended Data Fig. 7i).
Finally, we tested increased inter-individual variability in cellular composition in the human piriform cortex. Using a published human single-nucleus RNA-seq data set41 and conservation scores across three donors, we found that Pir glutamatergic neurons exhibited significantly lower conservation scores than SSp glutamatergic neurons (avg Pir-glut: 0.365, avg SSp-glut: 0.446). In contrast, Pir and SSp INs exhibited similar scores (avg Pir-INs: 0.461, avg SSp-INs: 0.472) (Extended Data Fig. 7j–l).
Together, these results reveal greater transcriptomic divergence in Pir compared to SSp. We speculate that plasticity-driven differentiation of adult piriform immature neurons contributes to molecularly diverse pyramidal cells.
Ancestral signatures of cortical identity in piriform cortex
Piriform cortex is central to the historical definition of paleocortex, as its three-layered cytoarchitecture resembles the pallium of amphibians and reptiles14,15,42. This raises the question whether neurons from conserved cytoarchitectures exhibit similar molecular profiles (Fig. 5a). We used canonical correlation analysis16 to integrate our piriform data set with published single-cell RNA sequencing data sets of the medial, dorsal, lateral, and ventral pallium of representative tetrapod groups, namely mammals17, reptiles43–45, and amphibians42 (Fig. 5b and Supplementary Fig. 4). Data integration generates new clusters (integrated clusters), enabling comparisons of neuron types across areas and species by quantifying neuron co-clustering. Our analysis yielded 39 integrated clusters of mature neurons, whose composition was consistent with previously identified transcriptomic similarities and differences42. For example, INs co-clustered across cortical areas, mouse hippocampal neurons co-clustered with those of reptiles and salamander medial regions, while neocortex extra-telencephalic neurons did not co-cluster with any other cell type (Supplementary Fig. 4 and tables 1–3). Surprisingly, the inclusion of non-mammalian data sets in the integration across mouse cortical areas revealed that piriform glutamatergic neurons exhibited greater transcriptomic similarity to those of turtle, lizard, and salamander than to neocortex neurons (Fig. 5c and Extended Data Fig. 9a). The majority of piriform glutamatergic neurons were present in integrated clusters 1 (25% of all piriform neurons), 4 (14%), 5 (17%), and 14 (22%), with clusters 1 and 5 also composed of neurons from neocortex and non-mammalian species (Fig. 5d, top). In cluster 1, piriform neurons co-clustered only with 2% of neocortex neurons, in contrast to over 20% of reptile DCtx, LCtx, and aDVR neurons, and more than 50% of salamander DP and VP neurons. In cluster 5, piriform neurons co-clustered primarily with neurons from lateral olfactory areas of reptiles (10%) and salamander (49%) (Extended Data Fig. 9a).
Fig. 5: Piriform glutamatergic neurons display marked transcriptomic similarities to mature neurons of reptile and salamander.

(a) Top: cytoarchitecture of cortical areas in salamanders (amphibian), turtles and lizards (reptiles), and mice (mammals). Bottom: Integration of sc-RNA seq data sets of salamander, turtle, lizard and mouse cortices. (b) UMAP of integrated sc-RNA seq data sets of mouse (this study, Yao., 202117), turtle and lizard43–45, and salamander42 cortical mature neurons. Colors indicate integrated clusters. (c) UMAPs of the integration color-coded by glutamatergic neurons from the dorsolateral cortical continuum of each species. Note that neurons of turtle and lizard are combined as reptile. NCx: neocortex; DCtx: dorsal cortex; LCtx: lateral cortex; aDVR: anterior dorsal ventricular ridge; DP, LP and VP: dorsal, lateral and ventral pallium. (d) Top: proportion of neurons from each species in each integrated cluster. Integrated clusters with >10% piriform SL and Pyr cells are highlighted in orange and blue, respectively. Bottom: UMAPs of neurons from mouse and non-mammalian cortical areas that co-clustered with piriform SL (left) and Pyr (right) cells. (e) LISI scores of neurons across piriform and other mouse cortical areas. A score of 4 indicates the cell type mixed well with neurons from non-mammals, a score of 0 indicates a cell type mixed only with neurons from the same species. Box-and-whisker plots show center (median), 25th and 75th percentile box bounds, whiskers up to 1.5*inter-quartile range, and points for scores beyond whiskers (see Source Data Table 1 for sample size). (f) Kernel-density estimate plots of cluster distance per cortical area based on highly variable TFs. Integrated clusters composed of glutamatergic neurons from all cortical areas were used. A separate integration was performed for each comparison (aPir, pPir, SSP with LP; VP; and DP). (g) Top: main steps in gene module reconstruction through differentially expressed genes (DEGs) shared between cell types that co-clustered. Bottom: scatter plots of DEGs between piriform SL vs Pyr cells (left), salamander LP vs deep DP glutamatergic neurons (middle), and salamander LP vs deep VP glutamatergic neurons (right). Orange and blue dots depict DEGs shared between mouse and salamander glutamatergic neurons, while dark gray dots depict statistically significant DEGs that are not shared. Light gray dots are non-significant DEGs.
Piriform SL cells co-clustered with functionally and developmentally related neurons of lateral olfactory cortical areas. Specifically, integrated cluster 14 comprised only mouse glutamatergic neurons (92% of piriform SL1; 21% of SL2 cells; 96% of AI cells; 4% of LEC fan cells). In contrast, integrated cluster 5 comprised glutamatergic neurons from all species, including 7.5% of piriform SL1 and 76% of SL2 cells, 96% of LEC fan cells, 94% of lizard LCtx superficial cells (Reln+, putative bowl cells46), and 63% and 86% of two salamander LP clusters (Reln+, superficial cells) (Fig. 5d and Supplementary Fig. 6b). In contrast to SL cells, piriform Pyr cells co-clustered with a heterogeneous mixture of glutamatergic neurons from dorsal, lateral, and ventral cortical areas of different developmental origins, such as mouse neocortex and LEC L2/3 intratelencephalic (IT) projection neurons, reptile LCtx and aDVR neurons, and salamander DP and VP deep layer neurons (Fig. 5d and Supplementary Fig. 6a). To corroborate our results, we quantified the degree of species mixing in each single-cell neighbourhood by computing the local inverse Simpson’s index (LISI score)39. The score ranges from 0 (lowest mixing) to 4 (highest mixing). Piriform SL1 cells exhibited the lowest LISI score across piriform neurons, while piriform Pyr cells exhibited the highest (Fig. 5e, see also Extended Data Fig. 9b). This analysis did not depend on the use of one-to-one orthologs in the cross-species integration (Supplementary Fig. 5), and further supports pronounced transcriptomic similarity of piriform Pyr cells with non-mammalian glutamatergic neurons.
Further DE analysis between piriform Pyr versus SL cells, and between salamander deep layer neurons from DP and VP versus LP identified marker genes shared between co-clustering cell types across species. The intersection of DEGs between piriform Pyr, DP, and VP neurons, and between piriform SL and LP neurons was used to define Pyr-like and SL-like gene modules (Fig. 5g, top). The SL-like module was enriched in neurons of lateral olfactory areas expressing Reln, namely piriform SL cells, LEC fan cells, and LP neurons, and in neurons of the retrosplenial cortex L4 (Extended Data Fig. 10), and included TFs such as Rorb, Satb1, and Tox (Fig. 5g, bottom). In contrast, the Pyr-like module showed broader enrichment in neurons of medial, dorsal, lateral, and ventral cortical areas (Extended Data Fig. 10s), and included TFs and transcriptional regulators such as Nfix, Lbd2, and Zfpm2 (Fig. 5g, bottom).
Finally, to support a model that pronounced graded molecular identities and a high degree of TF co-expression are ancestral features of cortical glutamatergic neuron diversity, we quantified cluster distance as a proxy for cell type discreteness and TF-TF co-expressions across glutamatergic neurons of aPir, pPir, SSp, and salamander LP, VP, and DP (see Figure 3). Notably, piriform glutamatergic neurons were more similar to those of salamander than to neocortex neurons in both cluster distance and TF co-expression analyses (Extended Data Fig. 6b).
Together, cross-species comparisons suggest that piriform neurons retained molecular signatures of ancestral cortical glutamatergic cells.
Conserved molecular profile in piriform and salamander immature neurons
While adult immature neurons are abundant throughout the pallium of non-mammals42,47, in mammals they are mostly confined to ventral-lateral and medial cortical areas48. Our piriform data set provides an opportunity to explore conserved molecular profiles of adult cortical immature neurons across species. We integrated immature and mature neurons of our data set (aPir, pPir, AI, SSp) with all neurons from the salamander data set. (Fig. 6a). Integrated cluster 7 was predominantly composed of piriform and salamander immature neurons (97% from aPir, 100% from pPir, 98% from salamander), with only 22% of AI neurons and no SSp neurons (Fig. 6b). Both piriform and salamander immature neurons expressed canonical markers of immature cells such as Dcx, Sox11, and Sox423, as well as specific markers such as Mex3a, Sema5b, and St8sia2 (Fig. 6c). The minimal co-clustering of salamander immature neurons with AI and SSp neurons, previously grouped into the Immature neuron supertype, suggests a specific, conserved molecular profile between piriform and salamander immature neurons.
Fig. 6: Salamander adult immature neurons display transcriptomic similarities to immature neurons of the piriform cortex.

(a) UMAP of integrated sc-RNA seq data sets of mouse (this study, aPir, pPir, AI, SSp) and salamander42 cortical immature and mature neurons color-coded by Seurat integrated clusters. (b) Quantification of co-clustering between mouse and salamander neurons. Supertypes SL, Pyr, and Vglut2 of this study are grouped as glutamatergic neurons for each mouse cortical area (GLUT). Colour indicates the percentage of neurons (rows) co-clustering within new integrated clusters (columns). (c) Gene expression levels of generic markers for glutamatergic (Slc17a7, Slc17a6, Tbr1) and INs (Gad1, Gad2) neurons, and of canonical and specific marker genes for adult immature neurons. (d) UMAPs showing expression enrichment visualized onto the mouse cortical data set of gene modules computed for salamander immature (left) and mature (right) neurons.
Finally, we computed DEGs between salamander immature versus mature neurons to identify gene modules specific for these two classes of cells, and applied the expression enrichment of salamander gene modules to our data set. The gene module of salamander immature neurons showed specific enrichment in the immature neurons of piriform cortex, while the gene module of salamander mature neurons showed complementary enrichment in all mature neurons of this study (Fig. 6d).
Together, these results suggest the presence of ancestral molecular signatures of adult cortical immature neurons in the piriform cortex.
Discussion
Here, we provide a comprehensive molecular characterization of cell types in the adult mouse olfactory (piriform) cortex. We then compare the transcriptomes and epigenomes of glutamatergic neurons across three- to six- layered mouse cortices, as well as the transcriptomes of glutamatergic neurons across mouse, reptile and salamander pallia (cortices). Together, these comparisons reveal that neurons of the olfactory cortex, in comparison with those of the neocortex, retained distinctive, ancestral molecular features of cortical cell identity. These features include: i) pronounced graded transcriptomic profiles and greater overlap in gene expression, ii) enhanced cellular plasticity, and iii) pronounced transcriptomic similarity with neurons of non-mammals.
Comparative enhancer-driven gene regulatory network analysis across three- to six-layered mouse cortices revealed highly area-specific epigenetic states and the differential use of transcription factor (TF) combinations within transcriptomically similar glutamatergic neurons (Fig. 3c–e, and Extended Data Figs. 5–6). Differences in regulatory logic in the absence of major changes in global gene expression represent a striking example of evolvability through weak regulatory linkage, a concept proposed to play a key role in the development and evolution of morphological traits11,49,50. The stark difference in epigenetic states we observed may represent a solution to diversify glutamatergic neurons with a limited set of TFs. Furthermore, comparative regulatory network analysis provided evidence for an expansion of transcriptional repression in the neocortex (Fig. 3h–i). Repressive interactions between TFs represent a mechanism to establish boundaries between cell types and cortical layers20,21. The lower degree of transcriptional repression we found in piriform compared to somatosensory cortex may thus explain the observed overlap in gene expression between piriform neurons. High overlap in gene expression may result from the co-expression of key TFs for cell type identity previously observed in piriform and in the pallium of non-mammals, but not in neocortex19,51,52. Such overlap in gene expression may give rise to more graded changes in the functional properties of neurons, potentially representing a shared ancestral signature of cortices with fewer layers. Together, our study highlights potential evolutionary strategies employed by global gene regulatory mechanisms to shape glutamatergic neuronal identity and diversity.
A distinctive feature of the adult piriform cortex is the presence of embryonically-generated immature neurons, which have been speculated to enhance circuit plasticity and facilitate adaptive changes to novel environments24,34,35. We characterized their transcriptomic profiles and explored a potential relationship between piriform adult immature neurons and plasticity-driven cellular divergence. Upon comparing the transcriptomes of anterior and posterior piriform and somatosensory cortex neurons between adult inbred (lab) and outbred wild-derived mice, we found the greatest cellular divergence in posterior piriform cortex (Fig. 4k). These results suggest enhanced adaptation of olfactory circuits across mammalian species, potentially driven by the differentiation of adult immature neurons into pyramidal cells. One intriguing possibility is that these embryonically-generated piriform immature neurons represent remnants of active neural progenitors found in olfactory areas of non-mammals53.
Finally, within- and cross-species transcriptomic comparisons revealed the presence of molecular signatures of ancestral glutamatergic (projection) neurons in the mammalian olfactory cortex, specifically of intratelencephalic input and non-input neurons. Some of the transcriptomic similarities between intratelencephalic input neurons may reflect homologous relationships between sister cell types, as their alignment is consistent with a common developmental origin. These intratelencephalic input neurons, including piriform semilunar cells, receive directly sensory input from the olfactory bulb and are located in superficial layers of lateral olfactory cortical areas in mice, reptiles, and salamander (Fig. 5d–e and Supplementary Fig. 6b) (see also54). In contrast, cross-species transcriptomic similarities between intratelencephalic non-input neurons located in deeper layers of ventral, lateral, and dorsal cortical areas are not consistent with a common developmental origin. This result may reflect convergent evolution or duplication and divergence of homologous intratelencephalic non-input neurons in earlier vertebrates.
More generally, our cross-species transcriptomic analysis revealed unexpected similarities between piriform glutamatergic neurons and glutamatergic neurons of reptiles and salamander. For example, salamander dorsal pallium neurons exhibited greater co-clustering with piriform neurons than with neocortex neurons (Fig. 5c, e, and Extended Data Fig. 9a). The superimpositio lateralis model posits that an ancestral dorsal pallium co-opted developmental programs from dorsolateral areas, leading to cellular and molecular rearrangements that may have occurred in response to novel terrestrial pressures55. Interestingly, olfactory inputs also target dorsal and lateral regions in fish and amphibians, but became restricted to lateral areas in reptiles and mammals56. The transcriptomic similarities observed between intratelencephalic projection neurons from dorsal, lateral, and ventral pallial areas in extant tetrapods suggest the preservation of ancestral gene regulatory networks in these areas. These ancestral gene regulatory networks may have been exapted to support olfactory-driven behavioural adaptations. As the evolutionary process tinkers foremost with developmental spatiotemporal genetic control, it will be crucial to explore how variations across diverse cortical structures arise during development.
Methods
Experimental model and subject details
12 adult C57Bl/6 mice (RRID:IMSR_JAX:000664) (Mus musculus; 6 females and 6 males) and 3 adult Vglut1-Cre/INTACT-GFP transgenic mice (C57Bl/6 genetic background, RRID:IMSR_JAX:023527 crossed with RRID:IMSR_JAX:021039) (Mus musculus; 1 female and 2 males), were purchased from the Jackson Laboratory and used in this study. In addition, 10 adult wild-derived mice (Mus musculus; 5 females and 5 males), descendants of wild-caught mice trapped in the fields near livestock barns in Idaho, USA, and kept in the laboratory as an outbred stock, were used 36,37. All animals were between six and eight weeks of age and were kept in a temperature- and humidity-controlled environment with a light/dark cycle of 12 h and food and water available ad libitum. All animal protocols were approved by the Brown University’s Institutional Animal Care and Use Committee (protocol number: 21–03-0004) and the Institutional Animal Care and Use Committee of the Weizmann Institute of Science.
Tissue microdissection and single-nuclei isolation
Tissue microdissection
Mice were deeply anesthetized with 2.5% of 250 mg/kg Avertin and transcardially perfused with 10 ml of ice-cold phosphate-buffered saline (PBS). The brains were dissected and immediately manually sliced into 500–700 μm coronal sections using the adult mouse brain slicer matrix (Zivic Instruments, BSMAS001–1). Anterior piriform cortex (aPir), posterior piriform cortex (pPir), agranular insular cortex (AI), and primary somatosensory cortex (SSp) were micro-dissected under a stereo microscope. With respect to the dorsolateral boundaries, the rhinal fissure was used as a visual landmark for the microdissections: about half millimetre below the rhinal fissure for aPir and pPir, right below the fissure for AI, and about half millimetre above for SSp. With respect to the anterior-posterior axis, based on the Paxinos and Franklin Mouse Brain Atlas, 1–3 coronal slices were cut within a 2.20mm to 0.14mm window from Bregma, and 1–2 coronal slices within a −1.06mm to −2.06mm window from Bregma. aPir and AI were micro-dissected from anterior slices, while pPir and SSp from posterior slices. Both hemispheres were included for each cortical area. One slice in between aPir and pPir dissections was always removed to avoid piriform anterior-posterior border inconsistency across mice and strains. The remaining tissue was fixed overnight at 4°C in 4% paraformaldehyde (PFA) for post-hoc histological validation of the micro-dissected areas (see histology method section and Extended Data Figs 1 and 8b). Cells from neighbouring regions were identified in the transcriptomic analysis and removed from the data sets (see data pre-processing, QC, normalization and clustering method section).
28 individual biological replicates from lab mice and 12 individual biological replicates from wild-derived mice were sequenced: 9 (anterior piriform), 10 (posterior piriform), 5 (agranular insular) and 4 (primary somatosensory) replicates from lab mice, and 6 (anterior piriform), 4 (posterior piriform) and 2 (primary somatosensory) replicates from wild-derived mice.
Single-nuclei isolation
We isolated single nuclei suspensions from fresh tissue by adapting previously described procedures in Zeppilli et al., 202157 for ATAC sequencing (seq) experiments. Each biological replicate was minced separately and placed into a tube containing cold Nuclei PURE Lysis Buffer and 10% Triton X-100 (Sigma, NUC201–1KT). The minced tissue was transferred into a 7 ml ice-cold tissue grinder (Sigma, D9063), homogenized up and down 20–25 times, and filtered on ice through cell strainers of 100μm, 70μm and 40 μm (Pluriselect, 43–10040). RNasin Plus diluted 1/200 (Promega, N2611) was added in all solutions.
Single -nuclei isolation for multiome sequencing experiments
After centrifuging at 500 × g for 5 min at 4°C, the supernatant was aspirated and gently resuspended in 100 μl of a cold lysis solution containing Nuclei PURE Lysis Buffer, 10% Triton X-100, 0.01% of digitonin (ThermoFisher, BN2006), and 1% nuclease-free UltraPure™ BSA (ThermoFisher, AM2616). After 1 minute of incubation, 200 μl of cold 1X Nuclei Buffer (10x Genomics, 2000153/2000207) was added, the suspension was filtered again with a 40 μm cell strainer, centrifuged at 500 × g for 5 min at 4°C, and gently resuspended into a final volume of 50 μl of cold 1X Nuclei Buffer.
Single -nuclei isolation for RNA sequencing experiments
After centrifuging at 500 × g for 5 min at 4°C, the supernatant was aspirated and gently resuspended in 400 μl of a cold wash solution containing 1X Hanks’ Balanced Salt Solution HBSS and 1% nuclease-free UltraPure™ BSA. This step was repeated for a total of two times, and nuclei were resuspended into a final volume of 50 μl of cold wash solution.
Library preparation and single-nucleus ATAC and RNA sequencing
Library preparation
Single-nuclei libraries were generated using the Single Cell Multiome ATAC + Gene Expression kit (10x Genomics, PN-1000283). Manufacturer’s instructions were followed for Tn5-based transposition, cell capture, barcoding, reverse transcription, cDNA amplification, and ATAC and RNA libraries construction. For wild-derived mice, 4 of 6 biological replicates for aPir, 1 of 4 for pPir, and 1 of 2 for SSp were processed using the Single Cell 3’ Reagent Kits v3.1 dual index (10x Genomics, PN-1000268). Final libraries (40 libraries from gene expression and 32 libraries from ATAC experiments) were evaluated for quality and quantified using the Qubit fluorometer. The fragment size distribution was evaluated by Agilent TapeStation 2200 (Agilent Technologies). Libraries were further evaluated for proper incorporation of the Illumina adaptors on the Roche LightCycler 480 using the Roche Kapa library quant assay according to manufacturer’s protocol.
Sequencing
Libraries were pooled and sequenced on an Illumina NovaSeq6000 instrument. ATAC libraries were sequenced using the read lengths of R1: 50 bp R2: 49 bp I1: 8 bp I2: 24 bp. RNA libraries were sequenced using the read lengths of R1: 28 bp R2: 90 bp I1: 10 bp I2: 10 bp. The mean raw read pairs per cell achieved across the 72 libraries were 295,832.59 reads/nucleus for the Multiome GEX libraries, 335,692 reads/nucleus for the Multiome ATAC libraries, and 54,838.75 reads/nucleus for the Single Cell 3’ v3.1 gene expression libraries.
Pre-processing, quality control, normalization and clustering of RNA and ATAC data
Genome alignment
The generated FASTQ files were processed with Cell Ranger ARC (v2.0.0) for Multiome ATAC + Gene expression experiments and with Cell Ranger (v6.0.0) for Single Cell 3’ v3.1 experiments (10x Genomics). Reads were aligned to the mouse (Mus musculus) pre-mRNA reference genome (cellranger-arc-mm10–2020-A-2.0.0) for both lab and wild-derived mice.
Transcriptome analysis
Individual biological replicates of all cortical areas were merged into a single data set for each mouse strain. The main two data sets were the lab data set (which includes only 10x Multiome experiments from all areas) and the wild data set (which includes 10x Multiome and Single Cell 3’ v3.1 experiments from all areas). Scanpy (v1.8.2) was used to perform most of the transcriptomic analysis 58. Only nuclei that had between 2,000 and 10,000 genes per nucleus (average 4,874 and 2,900 genes/nucleus for lab and wild data sets, respectively), and a percentage of mitochondrial counts below 2.5% were retained (Extended Data Figs 2 and 8c–d). Note that a higher number of glia cells compared to neurons were filtered out due to their lower number of genes per nucleus. Genes expressed in less than 7 nuclei were also removed. Doublets were detected using Scrublet (v0.2.3) with default parameters 59. After quality filtering, lab and wild data sets comprised 7,840 and 26,975 high-quality nuclei, respectively. Processing for each data set consisted of normalization of the expression matrix using the R package SCRAN (v3.12) called from Python with default parameters 60, identification of the top 2,000 highly variable genes (HVG) amongst replicates, regression of the percentage of mitochondrial content and number of counts, and scaling of the expression values per replicate. Principal Component Analysis (PCA) linear dimensionality reduction was then performed on the scaled data, and the first 200 principal components (PCs) were selected for the generation of a k-nearest-neighbor (knn) graph. The knn graph served as input for unsupervised clustering using the graph-based Leiden algorithm 61, and for visualization in low dimensional spaces using Uniform Manifold Approximation and Projection (UMAP) or Partition-based graph abstraction (PAGA)62. To correct for batch effects, we used Harmony on the selected PCs (200)39. For both lab and wild-derived data sets, we performed a first coarse clustering and identified potential cells from neighbouring areas using differential expression (DE) analysis and available RNA in situ hybridization (ISH) data from the Allen Brain Atlas. We located these neighbouring cells primarily to endopiriform nucleus/claustrum based on a combination of genes such as Npsr1, Rorb, Rspo2, Fezf2, Nr4a2, Slc26a4, Reln and Pou6f2. These genes were not expressed together in clusters clearly identified as piriform cells using other established markers. After the removal of ‘neighbouring cells’, we repeated all the computational steps above, excluding quality filtering. Leiden clustering was run using the following neighbours and resolution parameters: 50 – 2.5 (coarse, lab data set) and 50 – 1.5 (coarse, wild data set). In both data sets, we identified classes of neuronal and non-neuronal cells. These classes included mature neurons (Syt1, Syn1, Rbfox3, Slc17a7, Gad1, Gad2), immature neurons (higher levels of Dcx, Sox4, Sox11 compared to mature neurons), microglia (Tmem119, Siglech), oligodendrocytes (Mog, Mbp, Mobp), oligodendrocyte precursors (Pdgfra), astrocytes (Gfap, Sox9, Slc6a11) and vascular leptomeningeal cells (Vtn, Dcn, Egfl7). Mature and immature neurons were further sub-clustered by repeating all the computational steps above, excluding quality filtering. For sub-clustering, we used neighbours and resolution parameters: 50 – 3.7 (neurons, lab data set) and 50 – 2.9 (neurons, wild data set). This resulted in 5,553 high-quality nuclei grouped in 27 supertypes (lab data set), and 24,901 high-quality nuclei grouped in 36 supertypes (wild data set). Supertypes are defined as clusters derived from the unsupervised clustering of neurons from all cortical areas. Note that the number of nuclei was higher in the wild data set due to the additional use of the Single Cell 3’ v3.1 kit.
Epigenome analysis
The corresponding ATAC seq data of the lab data set (multiome sub-clustered data set, which includes only mature and immature neurons) were processed using pycisTopic (v1.0.2.dev8+g848f78b) 18. Note that the ATAC data of the wild data set (multiome replicates) were not processed within the context of this study, thus hereafter we refer only to the lab data set for the epigenomic analysis. Nuclei with > 3.5 log unique fragments per nucleus (average 53,470 fragments/nucleus), FRIP > 0.2, and TSS enrichment > 4.0 were retained, resulting in 5,190 high-quality nuclei that passed both transcriptome- and epigenome-specific quality control (QC) metrics (Extended Data Fig 2). For each cortical area, we downsampled cell numbers to the same number by maintaining the original supertype proportions to avoid interpretations based on asymmetric data. The downsampled data set comprised 3,430 nuclei from all areas. In all analyses, we used both the original and downsampled data set to verify robustness of the results. In-depth analyses were only performed on the downsampled data set. Next, we generated pseudo-bulk ATAC seq data sets by combining, for each cortical area, fragment reads for the transcriptome-based neuronal clusters. Peak calling was performed on these pseudo-bulk data using MACS2 (v2) 63 with default parameters: shift=73, ext_size=146, keep_dup=‘all’, q_value=0.05. To generate a list of consensus peaks, we used pycisTopic’s iterative peak calling algorithm, which resulted in 479,584 chromatin accessibility regions across the four cortical areas. We used Mallet (v2.0.8) for topic modeling through Latent Dirichlet Allocation (500 iterations). Models were selected based on the stabilization of Arun_2010, Cao_Juan_2009, Minmo_2011 and log likelihood quality metrics. A single model of 50 topics (without downsampling) and 75 topics (with downsampling) was selected for all cortical areas together, and models of 32, 32, 35, and 37 topics were selected for aPir, pPir, AI, and SSp, respectively (all with downsampling). Batch correction was applied on scaled topic distributions using the Python implementation of Harmony 39, and clustering was performed using the Leiden algorithm 61 with neighbors and resolution parameters 50 and 2.0, respectively. Differentially Accessible Regions (DARs) were calculated between cortical areas, between supertypes within and across cortical areas, and between cortical layers within each cortical area, using default parameters: adjpval_thr=0.05, log2fc_thr=1.0. Next, to enable transcription factor (TF) binding motif predictions on the ATAC seq data, custom motif rankings and scores databases were generated using the protocols provided on https://github.com/aertslab/create_cisTarget_databases. Using these databases, motif enrichment analysis was performed on the different sets of DARs and on binarized topics (Otsu thresholding) using the cistarget and DEM methods of pycisTarget (v1.0.2.dev8+g48af509.d20220905); with http://sep2019.archive.ensembl.org as biomart host, motif annotations v10nr_clust (public version), ctx_auc_threshold=0.005, ctx_nes_threshold=3.0, ctx_rank_threshold=0.05, dem_log2fc_thr=1.0, dem_motif_hit_thr=3.0, and dem_max_bg_regions=500.
Supertype annotation and quantification across cortical areas
Supertype annotation
We used a piriform-based cell type taxonomy across cortical areas. We assigned supertype labels based on the expression of well-established marker genes as well as based on genes identified through DE analysis. DE analysis on piriform lab data (raw data) was performed in scanpy 58 using the functions tl.rank_genes_groups with method t-test, penalty L2, and tl.filter_rank_genes_groups based on minimum log-fold-change of 3. In situ hybridization (ISH) images from the Allen Brain Institute and immunohistochemical experiments for the DE genes were then used to attribute piriform layer specificity.
For SSp, we further matched the piriform-based supertype labels with the standard SSp nomenclature based on the Allen Brain Institute as described in the method section below (Integration between SSp data sets).
Supertype quantification across cortical areas
For each cortical area, relative contributions to each supertype were quantified by normalizing the number of nuclei of a single area to 1. We defined enrichment of supertype s in area a with respect to area b as the ratio between normalized sa / normalized sb > 1.75. For visualization in PAGA plots with pie charts representing the (relative) contributions of each area to a supertype, we took per-cortical-area normalizations of a supertype and re-normalized these to add up to 1 (a full pie chart). Within cortical areas, we quantified the contribution of supertypes with respect to cortical layer assignments using fractions.
Integration of single-cell RNA sequencing data sets from this study and Yao 2021
Integration between SSp data sets
We integrated our SSp data set (from the lab data set) with the SSp data set from a reference single-cell atlas 17 to infer connectivity- and layer-specific information. We integrated the two SSp data sets using the R package Seurat (v4.3) 16. After subsetting to only SSp cells for each data set, the two subsets were normalized using Seurat’s SCTransform v2 function, which corrects for differences in sequencing depth. Each SSp subset was regressed by the number of counts. The two subsets were then merged together in a list class object, from which the integration features were calculated. We computed 2000 HVGs. Pairs of mutual nearest neighbors (anchors) were identified using the FindIntegrationAnchors function with the arguments reduction=“CCA” (canonical correlation analysis) and normalization method= SCT. Integration was carried out with the Seurat function IntegrateData, normalizing with SCT using the anchor sets and 200 PCs. Dimensionality reduction was performed by calculating PCA with 200 PCs. Unsupervised clustering was performed with the default method using the SLM algorithm and clustering parameters 25 (neighbors), 3 (resolution). The resulting integrated object was then visualized in a low dimensional space using the UMAP algorithm. New clusters that result from the unsupervised clustering of the two integrated data sets, also referred to as integrated clusters, were used to assess the co-clustering between the two data sets and to assign the neuron type identity to the SSp clusters of our data set.
Integration between all mouse cortical areas
We integrated our entire lab data set (aPir, pPir, AI, and SSp) with the mouse reference single-cell atlas 17, whose cell types also included connectivity- and layer-specific information. We used the R package Seurat (v4.3) 16, and the same computational pipeline and parameters used for the integration between the SSp data sets described above. The reference atlas was subsampled to 25,000 cells from the original data set and included glutamatergic and inhibitory neurons from the entire mouse cortex, excluding piriform and cortical amygdala that were not sampled in Yao et al., 2021. New clusters that result from the unsupervised clustering of the integrated data sets, also referred to as integrated clusters, were used to assess and quantify the co-clustering between piriform and neocortical glutamatergic neurons, and between piriform glutamatergic neurons and glutamatergic neurons from transition areas (ENT, ENTl, ENTm, TPE-ENT) and hippocampal formation areas (Sub, Sub-ProS, PPP, RHP, DG, CA1–2-3). The quantification was based on integrated clusters having at least 8 cells from both piriform and the other areas (i.e., minimum cluster size 16). The fraction of piriform glutamatergic neurons co-clustering with the other neurons is then the sum of piriform cells in the integrated clusters divided by the total number of piriform cells in the integrated object.
Inference of enhancer-driven Gene Regulatory Networks (e-GRNs)
e-GRNs computation
To infer e-GRNs across cortical areas, we applied SCENIC+ 18 on the lab data set (on the sub-clustered and downsampled data set as described in the method section epigenome analysis). We computed e-GRNs for each cortical area by including glutamatergic and inhibitory neurons (INs), namely SL, Pyr, Vglut2 and IN cells. Immature neurons were excluded. We used a genomic search region around genes of +/−500kb, http://sep2019.archive.ensembl.org as biomart host, and otherwise default parameters. Promoter regions were excluded from the analysis as they are ubiquitously open and tend to provide little discriminatory information 18. SCENIC+ identifies e-regulons, which consist of TFs, their target enhancers, and their downstream target genes. High quality e-regulons were selected by keeping those with a correlation >0.4 between the areas under the curve (AUC) of target gene activity and target enhancer activity, and with ≥10 number of target genes. Given inhibitory neurons were transcriptomically and epigenetically well-conserved across cortical areas, they were used as an internal control. e-Regulons specific for inhibitory neurons were determined using Regulon Specificity Score (RSS) and removed for analyses specific to glutamatergic neurons.
Transcription factor binding site motifs
To assess which binding motifs were identified for each TF, we ran pycisTarget on cistromes of each e-regulon 18. We then ordered the resulting binding motifs by their Normalized Enrichment Score (NES), a metric that captures how enriched a motif is in comparison to the average presence of all motifs. Only TFs (both e-regulon and non-e-regulon TFs) expressed in at least 20% of neurons belonging to a particular cortical layer were considered. We then examined the number of CREs in which the binding motif was found in relation to the total number of CREs for the TF. For the shared e-regulons, we compared across area-specific e-GRNs which binding motif was the top motif. We visualized these motifs for selected e-regulons.
Transcription factor combinations
For each area-specific e-GRN, we quantified TF combinations for each target gene. We grouped all CREs of a gene and interrogated them for the TFs that are capable of binding at these genomic sites. Next, in a pairwise fashion, TF-TF combinations were counted and visualized using clustered heatmaps. We performed this analysis both for shared e-regulons and for all e-regulons of an e-GRN. Moreover, single genes were inspected for differential accessibility of their coding sequence and their surrounding chromatin, including for differential importance of their CREs.
Quantification of supertype (cluster) distance
To determine if cell types formed graded or discrete molecular identities, we quantified and compared cluster discreteness between cortical areas by including only supertypes composed of glutamatergic neurons from all areas. We analysed our multiome data set in four ways: on transcriptomically similar neurons from all cortical layers or selectively on transcriptomically similar neurons within a particular layer, and by using all 2000 highly variable genes (transcriptome-wide) or only highly variable transcription factors. The within-layer analysis makes sure results are not based solely on differences between layers. We also used the downsampled data sets as done in the ATAC analysis to avoid interpretations based on asymmetric data sets. We first re-processed the transcriptome of the downsampled data sets. Next, we defined the distance between clusters as the minimal pairwise euclidean distance between supertypes for each area, using their PCA representation (number of PCs = 200). As this measure may be sensitive to outliers, we sampled 100 times 90% of the cells in each cluster and computed the distribution of cluster distances. We visualized results with kernel-density-estimate plots using kdeplot from Seaborn (v0.11.2) and we confirmed observations of one distribution being less than another using Mann-Whitney rank tests. Across all cortical layers: aPir < pPir: p = 0.999; aPir < SSp: p = 3.571e-109; pPir < SSp: p = 5.785e-206; AI < SSp: p = 1.0. Within the single layer Pir 2b/3, SSp L2/3: aPir < pPir: p = 1.0; aPir < SSp: p = 3.501e-5; pPir < SSp: p = 5.233e-124; AI < SSp: p = 0.459. Robustness of cluster distance results for all cortical layers was verified by re-clustering the downsampled data for different neighbourhood graphs (n=30, 50, 70), different Leiden resolutions (res=2.7, 3.2, 3.7, 4.2, 4.7), and re-computing minimal cluster distances as described above. We confirmed qualitatively similar results for clusters defined with the transcriptome-wide and TF-only approach.
We extended cluster distance analysis to selected mouse cortical areas from the reference single-cell atlas Yao et al., 2021. As described above, we quantified cluster distances by including only clusters composed of glutamatergic neurons from all areas and by using highly variable transcription factors. We integrated cortical areas SSp (as internal control to compare with our SSp data set), primary motor cortex (MOp), visual cortex (VIS), and dentate gyrus (DG) with our data using Harmony with 50 PCs. After Leiden clustering, we retained clusters with at least 10 cells from each area. We downsampled each cluster such that it contained an equal number of cells per area. Given the downsampled data, we re-integrated and re-clustered. Finally, we computed cluster distances on the PCA representation (number of PCs = 50) as described above.
We also applied cluster distance analysis to the salamander data set (Woych et al., 2022). We made two adaptations to the workflow described above. First, we integrated cortical areas aPir, pPir, SSp from our multiome data set with each salamander area separately (three integrations), namely with i) dorsal pallium (DP), ii) lateral pallium (LP), and iii) ventral pallium (VP) using highly variable transcription factors. Second, after Leiden clustering, we retained clusters with at least 20 cells from each area. After downsampling, re-integrating and re-clustering, cluster distances were computed as described above.
Transcription factor co-expression quantification
Transcription (TF) co-expression was measured by pairwise correlations. For details see Supplementary Additional Methods.
Quantification and visualization of transcriptional repression in e-GRNs
We quantified and compared transcriptional repression across cortical areas by quantifying activating and repressive interactions in each area-specific e-GRN. Area-specific supertypes were excluded to better compare predicted regulatory interactions within transcriptomically similar neurons. Using for each network its high-quality e-regulons, a percentage of repressive interactions was calculated using the formula 100 * repr / (act + repr), where act indicated activating TF-TF interactions and repr repressive ones. Only TFs (both e-regulon and non-e-regulon TFs) expressed in at least 20% of neurons belonging to a particular cortical layer were considered. For details see Supplementary Additional Methods.
Molecular characterization of adult posterior piriform immature neurons
Integration of the posterior piriform data set with whole-brain single-cell and spatial (MERFISH) reference atlases
The pPir lab data set was mapped to the whole-brain single-cell64 and spatial (MERFISH)25 reference atlases from the BRAIN Initiative Cell Census Network (BICCN) using the hierarchical correlation mapping algorithm in the Allen Institute’s MapMyCells tool and the Allen Mouse Brain Common Coordinate Framework (CCFv3)65. Immature neurons aligned to the atlas cluster 0142 DG-PIR Ex IMN_3 and the Pyr7 subtype aligned to the atlas cluster 0039 L2/3 IT PIR-ENTl Glut_1 with high confidence (bootstrap probability > 70%, with a mean of 98 and 99% respectively).
Linear Support Vector Classifier
A classification pipeline was constructed using scikit-learn (v1.3.0) 66. First, to distinguish between neuron types Pyr, SL, Vglut2, IN_CGE, and IN_MGE, classification was performed on log-normalized gene expression and pycistopic-normalized chromatin accessibility data from the multiome pPir lab data set. The classification pipeline used the z-scored accessibility of the top 1000 features, 30 PCs, and a linear Support Vector Classifier (SVC) with regularization parameter C=1. Immature neurons were excluded from the model training. These models were fit with stratified k-fold (k=5) cross-validation, such that the entire pipeline was only fit using the training data. Classification performance was evaluated on held-out cells (immature neurons). The models were fit by subsampling equal numbers of cells (50) for each neuron type, and classification accuracy and generalization performance was assessed across 100 restarts. To evaluate which neuron type was most similar to immature neurons, the fitted models on each restart were then applied to the immature cells and the percentage of immature cells that were predicted to be each neuron type were recorded. Note that the majority of the top features in these classification pipelines were chromatin accessibility peaks.
To distinguish between neuron subtypes, a similar pipeline was applied to the multiome pPir lab data set, but using the top 5000 features, 50 PCs, and SVC with regularization parameter C=10. These hyperparameters were found via a grid search, however results were largely insensitive to the exact model used. Finally, to distinguish between neuron types indipendently from origin og the mouse strain, a similar pipeline was applied to the log-normalized gene expression data from combined pPir lab and wild data sets using the top 100 genes (SelectKBest with k=100), 25 PCs, SVC with regularization parameter C=0.1, and subsampling equal numbers of cells (250) for each neuron type.
In all conditions, classification and generalization performance were at chance levels when classifiers were trained on data with permuted neuron type or subtype labels.
Chromatin accessibility distance
We used the f-statistic from sklearn’s f_classif function to compute the top 5000 chromatin accessibility peaks that distinguish Pyr7 subtype (the molecularly-closest subtype to immature neurons) from other Pyr subtypes. The accessibility of these peaks was then evaluated across all glutamatergic pPir subtypes using the pycistopic-generated pseudobulk bigwig files. Accessibility is normalized for each subtype to fragments per million.
Gene trajectory inference using optimal transport
To infer gene dynamics between immature and mature glutamatergic neurons, we subsetted the pPir lab data set to include only immature and pyramidal subtypes and applied GeneTrajectory26 with default parameters. GeneTrajectory uses optimal transport to infer temporal trajectories of genes. First, we selected genes that are expressed in at least 1% of the entire cell population (7,773 genes). Next, we constructed a cell-cell kNN graph with k=5 and computed cell-cell graph-based Wasserstein distances. We then used Diffusion Map to generate a low-dimensional representation of genes, which were subsequently visualized in 3D and 2D low-dimensional spaces.
Integration of single-cell RNA sequencing data using optimal transport
To compare the degree of similarity between lab and wild-derived mice across cortical areas (aPir, pPir, and SSp), we aligned gene expression data in a shared computational space and quantified the degree of overlap between lab and wild data sets. All computational steps are carried out separately for each cortical area. To perform the alignment, we used an optimal transport (OT) framework treating single-cells measurements as probability distributions 38. The method first finds cell-to-cell correspondence probabilities between lab and wild data sets, and then co-embeds the two data sets in a shared space based on these probabilities. Finally, we calculated a ‘conservation score between the co-embedded lab and wild data sets (see method section below). We used inhibitory neurons (INs) as internal control as being transcriptomically and epigenetically well-conserved across cortical areas compared to glutamatergic neurons. We also excluded immature neurons from this analysis as they are hypothesized to be a potential source of variation. For details on OT integration and definition of conservation score, see Supplementary Additional Methods38,67,68.
We applied the same computational pipeline to compare the degree of neuronal similarity between individual human donors across piriform and primary somatosensory cortex. We aligned gene expression data using the published sn-RNA seq data set of the adult human whole brain41. From the data set, only glutamatergic and INs neurons from the cortical areas piriform and primary somatosensory cortex (labelled as PIR and S1C respectively in Siletti et al., 2023) were retained and used for the integration.
ScVI integration between lab and wild pPir data sets
Gene expression data from lab and wild pPir data sets were integrated using scvi-tools (v0.17.3), and the Single-cell Variational Inference (scVI) model 40. In brief, the top 3000 high variable genes (identified via the scvi poisson_gene_selection function) were used to train an scVI model with default parameters (n_hidden=128, n_latent=10, n_layers=1, dropout_rate=0.1) and a negative-binomial gene likelihood. Each biological replicate and mouse strain were encoded as categorical covariates and the log-total counts were used as a continuous covariate. The model was trained for 200 epochs until the ELBO converged. To evaluate whether neuronal types (SL, Pyr, INs, Vglut2) were aligned in this shared latent space, the pairwise cosine distances were calculated for each cell and then averaged depending on the neuron type and mouse strain of the cells in each pair. To evaluate whether OT-identified misaligned neurons were also less aligned in the scVI latent space, a nearest neighbour graph was constructed for n=5–100 neighbours and, at each value of n, the average percent of neighbours that were from lab mice were calculated for both wild-specific neurons compared to other aligned Pyr neurons.
Integration of single-cell RNA sequencing data sets across species
Orthologous gene alignment
One-to-one orthologues were used to determine which genes were to be included for the cross-species comparison. EggNOG orthology assignments for lizard (Pogona vitticeps) and turtle (Trachemys scripta) were taken from 45, whereas salamander (Pleurodeles waltl) and mouse (Mus musculus) were taken from 42.
Cortical areas and species included in the cross-species analyses
For the analysis on mature-only neurons, we integrated single cell RNA sequencing (sc-RNA seq) data from lizard Pogona vitticeps 43,44, turtle Trachemys scripta 45, salamander Pleurodeles waltl 42 and mouse Mus musculus (this study - lab data set, and Yao et al., 202117). We subsetted the original data sets to include only cells from ontogenetically equivalent brain regions. The lizard data set included glutamatergic neurons from medial cortex (MCtx), dorsal cortex (DCtx), lateral cortex (LCtx), anterior dorsal ventricular ridge (aDVR), and inhibitory neurons (INs). For the turtle data set, we used the same approach described in 42, which in short is keeping glutamatergic and INs, and excluding unidentified clusters. The salamander data set included glutamatergic neurons from medial (MP), dorsal (DP), lateral (LP) and ventral (VP) pallia, and INs. The mouse data set included glutamatergic and inhibitory neurons from the lab data set of this study (including aPir, pPir, AI and SSp), and glutamatergic and inhibitory neurons from Yao et al., 2021 17. The latter was subsampled to 25,000 cells from the original data set and included glutamatergic and inhibitory neurons from the entire mouse cortex, excluding piriform and cortical amygdala that were not sampled in Yao et al., 2021.
For the analysis on immature neurons, we integrated sc-RNA seq data from salamander Pleurodeles waltl 42 and mouse Mus musculus (this study - lab data set). The salamander data set included immature and mature neurons from medial (MP), dorsal (DP), lateral (LP) and ventral (VP) pallia, and INs. The mouse data set included immature and mature neurons from cortices aPir, pPir, AI, and SSp, and INs.
Cross-species integration analysis
Only one-to-one orthologs in all species analysed were used for cross-species comparisons. For the analysis on mature-only neurons, we used the R package Seurat (v4.3) 16 to integrate gene expression data across species. Each data set was normalized independently using Seurat’s SCTransform v2 function, which corrects for differences in sequencing depth. Each data set was regressed by percent of mitochondrial genes and animal of origin. The data sets were then merged together in a list class object. Highly variable genes (HVGs) were computed by performing differential expression (DE) analysis on individual objects using the FindAllMarkers function in Seurat with default parameters. From the DE analysis, we kept all genes of each cluster that had at least a positive 0.2 difference in the proportion of expressing cells. Data set-specific gene lists were then combined into a single list containing 6,193 genes, which was then filtered to keep only genes present in all data sets, resulting in a final list of 3,548 genes used for integration. Pairs of mutual nearest neighbours (anchors) were identified using the FindIntegrationAnchors function with the arguments reduction=“CCA” (Canonical Correlation Analysis) and normalization method= SCT. Integration was carried out with the Seurat function IntegrateData, normalizing with SCT, using the anchor sets obtained from FindIntegrationAnchors and 120 PCs. Dimensionality reduction was performed by performing PCA with 200 PCs. Unsupervised clustering was performed with the default method using the SLM algorithm and clustering res= 0.5. The resulting integrated object consisted of 53,823 cells from the four species, visualized in a low dimensional space using the UMAP algorithm. New clusters that result from the unsupervised clustering of the integrated data sets, also referred to as integrated clusters, were used to assess and quantify the co-clustering between the data sets.
For the analysis on immature neurons, a similar computational pipeline was used with the following modifications. We used the R package Seurat (v5.0.3). Data sets were normalized prior to the integration step with the SCTransform v2 function using 9,000 highly variable genes. The SelectIntegrationFeatures5 function was used to select genes for integration, returning 6,000 genes. Integration was carried out with the updated function IntegrateLaters with the arguments method=CCAIntegration, normalizing with SCT, and 100 PCs. The resulting integrated object consisted of 16,688 cells from the two species, visualized in a low dimensional space using the UMAP algorithm.
Species mixing score (LISI score)
To assess molecular similarity across cell types and species independently of the resolution of the integrated clusters, we used the R package LISI with 200 PCs, which computes the local inverse Simpson’s index (LISI score)39 for individual cells. The LISI score was used as a metric to evaluate the degree of species mixing in each single-cell neighbourhood in the embedding. The score ranges from 0 to 4 (for four species), with 0 indicating the worst mixing with other species and 4 indicating the greatest mixing with other species. Individual cells, each assigned with a LISI score, can then be grouped into different categories to analyse the degree of mixing at different resolutions (cells by cortical areas, cells by layers, etc).
Influence of one-to-one ortholog on cross-species integration
To assess whether the use of one-to-one ortholog in the cross-species integration influences co-clustering results, we calculated the proportion of one-to-one orthologs and non one-to-one orthologs for each cell type in the salamander and our multiome data set. For details see Supplementary Additional Methods.
Cross-species gene module score analysis
We analysed and compared the expression of cell-type specific gene modules to assess molecular similarity across cell types and species independently of data integration. For details see Supplementary Additional Methods.
Histology
Histological validation of the microdissected cortical areas
The remaining tissue from micro-dissections for sequencing experiments was fixed in 4% PFA at 4°C overnight. Coronal sections (200 μm thick) were prepared using a vibrating-blade Leica VT100S Vibratome and incubated in PBS, 0.1% Triton X-100 and Neurotrace counterstain diluted 1:1000 (ThermoFisher, N21483) at 4°C overnight.
Immunohistochemistry
3 adult Vglut1-Cre/INTACT-GFP transgenic mice, 2 adult wild-type mice, and 4 adult wild-derived mice were deeply anesthetized with 2.5% of 250 mg/kg Avertin and transcardially perfused with 10 ml of ice-cold PBS followed by 10 ml of 4% PFA. Brains were dissected and post-fixed in 4% PFA at 4°C for 5 hr. Coronal and horizontal sections (100 μm thick) were prepared using a Vibratome, permeabilized in PBS and 0.1% Triton X-100 for 1 h, blocked in PBS, 0.1% Triton X-100 and 2% heat-inactivated horse serum at 4°C for 4 h, and incubated with primary antibodies at 4°C overnight (see the list of primary antibodies used below). Note that for horizontal sections we specifically combined, in the same section, two piriform-specific markers with opposite gradients (DCX/posterior with RORB/anterior, and RELN/posterior with CARTPT/anterior) to ensure that the tissue within the section was piriform cortex throughout the anterior-posterior axis. Sections were rinsed in PBS and 0.1% Triton X-100 three times for 15 min, blocked in PBS, 0.1% Triton X-100 and 2% heat-inactivated horse serum at 4°C for 4h, and incubated with appropriate secondary antibodies (1/1000, Donkey IgG (H+L)) conjugated to 405, 488, Cy3 and Cy5 (Jackson Labs) together with DAPI (note that DAPI was not used in all experiments) at 4°C overnight. All histology samples were rinsed with PBS and mounted on SuperFrost Premium microscope slides (Fisher, 12–544-7) in Fluorescent Vectashield Mounting Medium (Vector, H-1900), and imaged at 10X and 20X using a Nikon A1R-HD confocal microscope.
Primary antibodies: RELN (mouse, 1/500, MAB5364 Millipore Sigma), RORB (mouse, 200, PP-N7–927-00 Perseus), CARTPT (rabbit, 1/2000, H-0003–62 Phoenix pharmaceutical), KCNG3 (rabbit, 1/1000, TA351316 Origene), EBF1 (rabbit, 1/800, AB10523 Millipore Sigma), EBF2 (sheep, 1/100, AF7006-SP R&D systems), PAPPA2 (goat, 1/500, AF1668-SP R&D systems), MEIS1 (rabbit, 1/1000, ab19867 Abcam), ST8SIA2 (rabbit, 1/200, 19736–1-AP Proteintech), DCX (guinea pig, 1/500, AB2253 Millipore Sigma), SATB1 (rabbit, 1/1500, ab109122 Abcam), CUX1 (rabbit, 1/500, discontinuous-Santacruz), GABA (rabbit, 1/500, A2052 Sigma), TBR1 (rabbit, 1/500, ab31940 Abcam), SOX11 (rabbit, 1/1000, ab134107 Abcam), BARHL1 (rabbit, 1/500, HPA 004809 Sigma).
Statistics and Reproducibility
Throughout the article, statistical tests are stated, along with P values, and the test statistic used. Sample sizes related to figure panels are provided in figure legends or detailed in Source Data Table 1. Histological images shown in this study are representative experiments and were repeated each 2 times. No statistical method was used to predetermine sample size. The experiments were not randomized. This study does not involve group allocation that requires blinding.
Extended Data
Extended Data Fig. 1. Relative abundance and generic markers of cell types, and histological assessment across biological replicates and cortical areas of lab mice.

(a) Relative abundance of main cell types across biological replicates (donors) integrated from single-nucleus multiome sequencing (sn-multiome seq) experiments. From left to right: anterior piriform cortex (aPir) replicates, indicated by A; posterior piriform cortex (pPir) replicates, indicated by P; agranular insular cortex (AI) replicates, indicated by T; primary somatosensory cortex (SSp) replicates, indicated by N. Numbers correspond to the ID of mice. VLMC: vascular leptomeningeal cells; Micro: microglia; OPC_diff: differentiating oligodendrocytes; OPC: oligodendrocyte precursors; Oligo: oligodendrocytes; Astro: astrocytes; IN_MGE and IN_CGE: inhibitory neurons from medial and caudal ganglionic eminence, respectively; SL: semilunar cells; Pyr: pyramidal cells. (b) Post-hoc histological assessment of aPir and AI dissections from anterior coronal sections of adult lab mice ordered by ID mouse number. Asterisks indicate the micro-dissected area. Neurotrace counterstain in gray. (c) Same as in (b) but for pPir and SSp. (d) From left to right: UMAPs of aPir, pPir, AI, and SSp datasets color-coded by main cell types. (e) Gene expression levels of representative markers for each cell type across the four cortical areas, from left to right: aPir, pPir, AI, and SSp.
Extended Data Fig. 2. Quality control of single-nucleus multiome (RNA and ATAC) sequencing data of lab mice.

(a) Transcriptome quality control. Left: number of genes per nucleus quantified for each biological replicate. A indicates aPir replicates, P indicates pPir, T indicates AI, N indicates SSp. Numbers correspond to the ID of mice. Right: number of genes per nucleus identified for each main cell type. (b) Same as in (a) but for the fraction of mitochondrial content per nucleus. (c) Epigenome quality control for aPir biological replicates. From left to right: barcode rank plot, fragment size distribution, Transcription Start Site (TSS) enrichment and Fraction of Reads In Peaks (FRIP). (d) Same as in (c) but for pPir biological replicates. (e) Same as in (c) but for AI biological replicates. (f) Same as in (c) but for SSp biological replicates.
Extended Data Fig. 3. Piriform cortex-specific markers and histological validation.

(a) Left: gene expression levels of generic or subtype-specific markers for semilunar cells in the combined aPir and pPir datasets. Right: in situ hybridization images from the Allen Brain Atlas for some of the markers shown in the dotplot on the left, with the exception of the marker SATB1, which was validated using immunohistochemistry in Vglut1-CRE/INTACT-GFP transgenic mice. (b) Same as (a), but for generic or subtype-specific markers for pyramidal cells in the combined aPir and pPir datasets. (c) High magnifications in Pir layer 3 of immunohistochemical experiments using Vglut1-CRE/INTACT-GFP transgenic mice showing combinatorial expression of the specific marker for Pyr11, EBF2 (in magenta), with specific or generic markers for glutamatergic neurons (in cyan): EBF1 (Pyr10–11-specific); TBR1 and CUX1 (pan-excitatory/pyramidal neuron marker); MEIS1 (Vglut2-specific). Scale bar, 100 μm. (d) Same as (a), but for generic or subtype-specific markers for Vglut2 cells in the combined aPir and pPir datasets. (e) High magnifications in Pir layer 3 of immunohistochemical experiments using Vglut1-CRE/INTACT-GFP transgenic mice showing combinatorial expression of the specific marker for Vglut2 cells, PAPPA2 (in magenta), with generic markers for semilunar cells (RELN), INs (GABA), immature neurons (SOX11), and pyramidal cells (CUX1), or with markers highly specific to Vglut2 cells (MEIS1), or expressed in Pir layer 3 (BARHL1) (in cyan), to understand the identity of the uncharacterized Vglut2-expressing neuronal population. Scale bar, 100 μm. (f) Same as (a), but for generic or subtype-specific markers for immature neurons in the combined aPir and pPir datasets. (g) Same as (a), but for generic or subtype-specific markers for INs in the combined aPir and pPir datasets.
Extended Data Fig. 4. Integration of neurons from this study and a mouse single-cell reference atlas.

(a) Integration of neurons from this study and a mouse single-cell reference atlas17. UMAP of integrated neurons (n=30,553). (b) UMAPs as shown in (a) with gene expression of generic markers for glutamatergic neurons (Slc17a7 and Slc17a6), inhibitory neurons (INs) (Gad2), and other established markers for neocortical projection neurons: Rorb (layer 4), Cux1 (layer 2/3), Ctip2 (layer 5), Fezf2 (layer 5), Foxp2 (layer 6), Nfia (layer 6). (c) Quantification of co-clustering between piriform, hippocampal formation and transition areas neurons from integration shown in (a). Neurons are grouped into main types per cortical area. Rectangles indicate co-clustering of neurons in the Seurat integrated clusters. Color represents the percentage of cells in an integrated cluster. L: layer; IT: intratelencephalic; NP: near projecting; CT: cortico-thalamic; Sub: subiculum; ProS: prosubiculum; PPP: para/post/pre subiculum; RHP: retrohippocampal region; DG: dentate gyrus; CA 1/2/3: hippocampal fields; IG: induseum griseum; FC: fasciola cinereal; AI: agranular insular; ENT: entorhinal (medial and lateral); TPE: Temporal association areas, Perirhinal area, Ectorhinal area. (d) Quantification of co-clustering between piriform glutamatergic neuron subtypes and hippocampal formation glutamatergic neurons from the integration shown in (a). Rectangles indicate co-clustering of neurons in the Seurat integrated clusters. Color represents the percentage of cells in an integrated cluster. Dentate gyrus (DG) glutamatergic neurons aligned with 13%, 31%, and 7% of piriform subtypes Pyr 7–8-9, respectively. CA1 glutamatergic neurons aligned with 4% of piriform subtype Pyr7. DG: dentate gyrus; CA 1/2/3: hippocampal fields. (e) UMAPs of gene expression of transcription factors (TFs) highly enriched in piriform cortex compared to other cortical areas. (f) UMAPs of integrated neurons from this study and the mouse single-cell reference atlas (bottom) (Yao 2021), and visualization of only SSp neurons from the two studies (top: this study, bottom: single-cell reference atlas). (g) Quantification of co-clustering between SSp neurons of this study and of the single-cell reference atlas. Only SSp datasets were included in the integration to transfer projection neuron profile nomenclature to this study. SSp_yao indicates neurons from (Yao 2021), the rest corresponds to SSp clusters of this study. Dots indicate co-clustering of neurons in the Seurat integrated clusters. Size of the dots represents the percentage of cells in the integrated cluster. (h) Gene expression levels of Car3 in the SSp dataset. Car3 is expressed in SSp neurons co-clustering with piriform neurons in the immature neuron supertype. These Car3+ SSp neurons co-cluster with L4/5/6 intratelencephalic Car3+ neurons of the single-cell reference atlas. Given also the lack of DCX expression in SSp, we do not consider SSp neurons falling in the immature type as immature neurons. (i) Gene expression levels of the TF Rorb, which is highly enriched in the SL1 supertype (left), and of established TFs present in the SSp-specific supertypes (right). Pyr 14–15-16 of layers 4 and 5 are characterized by Rorb and Fezf2, Pyr17 is characterized by Foxp2, Bcl11b (Ctip2) and Nfia, corresponding to CT neurons of layers 6.
Extended Data Fig. 5. Epigenetic divergence of transcriptome-based supertypes across mouse cortical areas.

(a) UMAP of multiome ATAC data colored by aPir, pPir, AI, and SSp datasets. Neurons are integrated using Harmony. (b) UMAP as in (a) colored by the corresponding transcriptome-based supertype. For interneurons areas mix, while for glutamatergic neurons piriform separates from SSp, while AI overlaps with both (see (a)). (c) UMAP as in (a) colored by epigenome-based leiden clusters. (d) Mapping of transcriptome-based clusters (RNA, supertypes) to epigenome-based (ATAC, leiden) clusters. For glutamatergic neurons, multiple area-specific epigenome-based clusters correspond to a single supertype. Adjusted Rand Indices (ARIs) quantify the cluster overlap: for all neurons= 0.43; for INs= 0.88; for glutamatergic neurons= 0.37. (e) High-quality e-regulons are selected for downstream analysis based on the correlation between AUC (Area Under the Curve) scores for target genes and target CREs (Cis Regulatory Elements). A correlation cut-off of 0.4 and a minimum number of target genes of 10 are used. (f) Upset plot of the intersection of target genes for the e-regulon Rorb(+), which is shared across aPir, pPir, AI and SSp. Vertical bars show the number of target genes in the corresponding intersection of the matrix below. Horizontal bars show the total number of target genes for each cortical area. Of note, main text states 9% overlap between aPir and SSp target genes. That overlap is computed with a Jaccard similarity index and equivalent to taking in the upset plot the (relative) size of a combination of intersections. (g) Upset plot of the intersection of target CREs for the e-regulon Rorb(+). See (f) for details. (h) From left to right, average log-normalized expression of target genes (TGs) of Rorb identified in aPir, pPir, AI, and SSp. TGs of a given area are shown across areas (even if they are regulated by other TFs). (i) Difference of gene expression between target genes identified in aPir and SSp for the e-regulon Rorb(+) remains within a 2-fold change. The average log- normalization expression (y-axis) is only used to spread data points. (j) From left to right, average log-normalized expression of a random set of genes (n=500) for aPir, pPir, AI, and SSp, similar to (h). Expression patterns are qualitatively equivalent as in (h). (k) Difference of gene expression between set of random genes in aPir and SSp datasets remains within a 2-fold change and indicates that Rorb TGs in (i) behave as a set of random genes.
Extended Data Fig. 6. Cell type discreteness and TF co-expression and repression analyses.

(a) Kernel-density estimate plots showing cluster distance per cortical area computed on integrated clusters composed of glutamatergic neurons from all areas using highly variable transcription factors. Integration was performed between our data (aPir, pPir, AI, SSp) and cortical areas from Yao et al., 2021: SSp (as internal control to compare with our SSp dataset), primary motor cortex (MOp), visual cortex (VIS), and dentate gyrus (DG). (b) Top 500 positively (left) and negatively (right) correlated TF pairs projected from one cortical area (source, rows) to another (target, columns). Distances are normalized to be 1.0 when source is target. Lower distance indicates lower correlation match. TF-TF correlations were computed from log-normalized expression matrices for each area (see Supplementary Fig. 3c). (c) Repressive (top row) and activating (middle row) TF-TF interactions identified per cortical area e-GRN (aPir, pPir, SSp). AI was excluded for lack of laminar information. Repression and activation were determined using SCENIC+, which considered (anti)correlations for each e-regulon as computed between the TF, its target genes and its target CREs. Results in main text are based on an (anti)correlation value of 0.4 (red line) and minimum number of target genes 10. Only TFs expressed in at least 20% of neurons in a given cortical layer are considered. Grey area indicates bounds of pattern stability. (d) Activating and repressing ATAC regions recovered from leave-10%-out datasets. Each dot represents the ratio of regions in the leave-10%-out data set over the original e-GRN, for both activating and repressing regions. Dots are coloured by cortical area. To indicate the trend per cortical area, linear regressions (solid lines) with +/− bootstrapped 95% confidence intervals (shaded area) are shown. As a reference, the dashed black line is the equal ratio of recovery for activating and repressing regions. (e) Gene expression levels of predicted piriform repressors shown in Fig. 3i in aPir (top) and pPir (bottom). Neurons are labelled by cortical layer (SL and Pyr cells), neuron type (INs and Vglut2 cells), or “Unknown” if layer information was unavailable.
Extended Data Fig. 7. Adult piriform cortex immature neurons and the transcriptomic divergence of pyramidal cells between lab and wild mice and between human individuals.

(a) Left: distribution of piriform cortex mature and immature neurons in reanalyzed MERFISH data (Zhang et al., 2023). Right: cumulative distribution along the anterior-posterior axis of piriform cortex of all pyramidal cells, immature neurons, and pyramidal cells from Yao 2023 aligning with subtype Pyr7 (Yao et al., 2023; Zhang et al., 2023). (b) Classification accuracy of support vector machine (SVM) trained on combined RNA and ATAC data of the pPir dataset. Each neuron type is distinguished. (c) Same as (b) but applied to subtypes. Each neuron subtype is distinguished. (d) Conservation scores for aPir, pPir, and SSp. Scores are median probabilities of transcriptomic similarity between lab and wild datasets, quantified by considering all mature neurons, or neurons of a given cell type (SL, Pyr, Vglut2, and IN). A conservation score of 0.5 indicates perfect mixing between lab and wild datasets. 95% confidence intervals (CIs) for each neuron type are reported in brackets. (e) Statistical significance of pairwise differences in conservation score between cortices reported per neuron type (see also (d)). P-values are calculated using two-sided Mann Whitney U-Tests and Bonferroni-corrected. P-values in red are smaller than α0.01(Bonf.)=8.34e-4, p-values in yellow are (only) smaller than α0.05(Bonf.)=4.167e-3. (f) Similarity (as cosine distance) between lab and wild neuron types upon integrating lab and wild pPir datasets using scVI. Pyramidal cells were on average less similar to each other than the other neuron types. (g) Percentage of lab pPir neighbours in scVI integration when computing nearest neighbour graphs from 5 to 100. Data shown as mean +/− bootstrapped 95% CIs, for 7861 aligned (other Pyr cells) and 184 misaligned cells (wild-specific Pyr cells). Alignment of cells between lab and wild data sets is taken from the OT integration shown in (c). (h) Classification accuracy of SVM trained on combined lab and wild pPir data. Neuron types are distinguished with 98.8% accuracy. (i) Immunohistochemistry using lab (top) and wild-derived mice (bottom) of markers for piriform neuronal populations (in cyan), namely RELN for SL, CUX1 for Pyr, GABA for INs, co-stained with DCX (in magenta), a canonical marker for immature neurons. Inset white boxes with higher magnification show co-expression of CUX1 with DCX in both lab and wild-derived mice. Scale bar, 100 μm. (j) Conservation scores per neuron type for adult human piriform (PIR) and primary somatosensory cortex (S1C) computed using published adult human whole-brain snRNA-seq data (Siletti et al., 2023). Scores are computed per pair of three human donors. Abbreviated donor IDs: h18 for H.18.30.001, h19.1 for h19.30.001, and h19.2 for H19.30.002. Within violin plots, black circles mark medians and black bars indicate 95% CIs. Bonferroni-corrected significance thresholds were α0.01(Bonf.)=1.59e-4 and α0.05(Bonf.)=7.94e-4. IT: intratelencephalic; NP: near projecting. (k) Conservation scores as in (d), but for human piriform (left) and primary somatosensory cortex (right). Scores are median probabilities of transcriptomic similarity per pair of donors. (l) Statistical significance of pairwise differences in conservation score as shown in (e), but between donors. P-values are calculated using two-sided Mann Whitney U-Tests and reported per neuron type.
Extended Data Fig. 8. Transcriptomically-defined cell types across cortical areas of wild-derived mice.

(a) Relative abundance of main cell types across biological replicates (donors) integrated from single-nucleus RNA (sn-RNA seq) and multiome (sn-multiome seq) sequencing experiments. Replicates IDs 1 and 2 derive from sn-multiome seq experiments, replicates IDs 3, 4, 5 and 6 derive from sn-RNA seq experiments. From left to right: aPir replicates, indicated by A; pPir replicates, indicated by P; SSp replicates, indicated by N. Numbers correspond to the ID of the donor mouse. W: wild. (b) Post-hoc histological assessment of aPir, pPir, and SSp dissections from anterior and posterior coronal sections of adult wild-derived mice ordered by ID mouse number. Asterisks indicate the micro-dissected area. Neurotrace counterstain in gray. (c) Left: number of genes per nucleus quantified for biological replicates. A indicates aPir replicates, P indicates pPir, N indicates SSp. Numbers correspond to the ID of mice. Right: number of genes per nucleus identified for main cell types. (d) Same as in (c) but for the fraction of mitochondrial content per nucleus. (e) From left to right: UMAPs of aPir, pPir, and SSp datasets color-coded by main cell types. (f) Gene expression levels of representative markers for each cell type across the three cortical areas, from left to right: aPir, pPir, and SSp. (g) Optimal transport (OT) alignment of main cell types between lab and wild datasets for each cortical area, from left to right: aPir, pPir, and SSp. Color and size of dots indicate the probability of alignment.
Extended Data Fig. 9. Quantification of co-clustering between cortical glutamatergic neurons of mice, reptiles, and salamander grouped by areas and LISI scores.

(a) Broad quantification of co-clustering in the integrated clusters between glutamatergic neurons from mouse and non-mammalian cortical areas, highlighting greater transcriptomic similarity of piriform glutamatergic neurons to those of non-mammals than to those of the neocortex. Pir: piriform; NCx: neocortex; DCtx, LCtx: dorsal, lateral cortex; aDVR: anterior dorsal ventricular ridge; dDP, LP, dVP: deep dorsal, lateral, deep ventral pallium. Rectangles indicate co-clustering of neurons (rows) in the integrated clusters (columns). Color of the rectangle represents the percentage of neurons in the integrated cluster. (b) Distribution of LISI scores across neuronal clusters of the datasets Tosches et al., 2018 (turtle and lizard), Hain et al., 2022 (lizard), and Woych et al., 2022 (salamander). A mean value close to 4 indicates a cell type that is well-mixed with neurons from other species, while a value close to 0 indicates a cell type mixed only with neurons from the same species. Box-and-whisker plots show min to max, center (median), 25th and 75th percentile box bounds, and whiskers extending to 1.5 * inter-quartile range (see Supplementary Table 1 for sample size).
Extended Data Fig. 10. Cross-species gene module score and TF co-expression analyses.

(a) Expression enrichment of gene modules Pyr-like (top, blue) and SL-like (bottom, orange) in this study (left), in the salamander dataset (middle), and in Yao., 2021 (right). NTS: neurotensin neurons. A positive score indicates that the set of genes in the module are expressed in a particular cluster more highly compared to the average expression across all clusters of the dataset. For the number of observations in each violin plot, see Supplementary Table 4. (b) UMAPs of SL-like (top) and Pyr-like (bottom) gene module scores across datasets shown in (a), from left to right: this study, Yao et al., 2021, and Woych et al., 2022. Colour bar indicates the enrichment score of a module.
Supplementary Material
Acknowledgments
We thank Zach Herbert and Maura Berkeley from the Molecular Biology Core Facilities at the Dana-Farber Cancer Institute for sequencing services. We thank Nicole Eckart from 10xGenomics for excellent technical support. We thank Kelsey Babcock, Federica Mosti, Debra Silver, Rachel Van Drunen, Jason Ritt, Stuart Firestein, Hugues Berry, Kevin Franks, Sophie Pantalacci, Guillaume Beslon, Hynek Wichterle, Brett Mensh, and Telmo Pievani for critical comments on the manuscript. We thank Andrea Pierre’ for software support and Carmen Bravo González-Blas for analysis support with SCENIC+. We thank Noa Nisim for technical assistance with the wild-derived mice and the Brown and Weizmann animal facilities for animal care. Work in the AF lab was supported by grants from the NIH (NIDCD R01DC017437 and R01DC020478 to AF), the Robert J and Nancy D Carney Institute for Brain Science to AF, and the Carney Graduate Award in Brain Science to SZ. Carney Institute computational resources used in this work were supported by the NIH Office of the Director grant S10OD025181. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Footnotes
Competing interests
Authors declare that they have no competing interests.
Data Availability
Raw and processed single nucleus multiome and RNA sequencing data have been deposited and available in Gene Expression Omnibus (GEO) under the accession number GSE239477. Seurat and Scanpy integrated objects are available upon request. All other data are included in the main paper or the supplementary materials.
Code Availability
The R and Python analysis scripts used for this paper are available at the GitLab link https://gitlab.com/fleischmann-lab/papers/zeppilli-et-al-2023.
References
- 1.Bear DM, Lassance J-M, Hoekstra HE & Datta SR The Evolving Neural and Genetic Architecture of Vertebrate Olfaction. Current Biology 26, R1039–R1049 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cisek P Evolution of behavioural control from chordates to primates. Phil. Trans. R. Soc. B 377, 20200522 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Roberts RJV, Pop S & Prieto-Godino LL Evolution of central neural circuits: state of the art and perspectives. Nat Rev Neurosci 23, 725–743 (2022). [DOI] [PubMed] [Google Scholar]
- 4.MacIver MA & Finlay BL The neuroecology of the water-to-land transition and the evolution of the vertebrate brain. Phil. Trans. R. Soc. B 377, 20200523 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Niimura Y Evolutionary dynamics of olfactory receptor genes in chordates: interaction between environments and genomic contents. Human Genomics 4, 107 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Weiss L, Manzini I & Hassenklöver T Olfaction across the water–air interface in anuran amphibians. Cell Tissue Res 383, 301–325 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Kaas JH The evolution of brains from early mammals to humans. WIREs Cognitive Science 4, 33–45 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rowe TB, Macrini TE & Luo Z-X Fossil Evidence on Origin of the Mammalian Brain. Science 332, 955–957 (2011). [DOI] [PubMed] [Google Scholar]
- 9.Rowe TB & Shepherd GM Role of ortho-retronasal olfaction in mammalian cortical evolution. Journal of Comparative Neurology 524, 471–495 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kaas JH Evolution of the neocortex. Curr Biol 16, R910–914 (2006). [DOI] [PubMed] [Google Scholar]
- 11.Carroll SB Endless Forms: The Evolution of Gene Regulation and Morphological Diversity. Cell 101, 577–580 (2000). [DOI] [PubMed] [Google Scholar]
- 12.Striedter GF The Telencephalon of Tetrapods in Evolution; pp. 179–194. Brain Behavior and Evolution 49, 179–194 (2008). [DOI] [PubMed] [Google Scholar]
- 13.García-Cabezas MÁ, Zikopoulos B & Barbas H The Structural Model: a theory linking connections, plasticity, pathology, development and evolution of the cerebral cortex. Brain Struct Funct 224, 985–1008 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kappers CA The Phylogenesis of the Palaeo-Cortex and Archi-Cortes Compared with the Evolution of the Visual Neo-Cortex. (J. Truscott and Son, 1909). [Google Scholar]
- 15.Laurent G et al. Cortical Evolution: Introduction to the Reptilian Cortex. in Micro-, Meso- and Macro-Dynamics of the Brain (eds. Buzsáki G & Christen Y) (Springer, Cham (CH), 2016). [PubMed] [Google Scholar]
- 16.Hao Y et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Yao Z et al. A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell 184, 3222–3241.e26 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bravo González-Blas C et al. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat Methods 1–13 (2023) doi: 10.1038/s41592-023-01938-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Diodato A et al. Molecular signatures of neural connectivity in the olfactory cortex. Nat Commun 7, (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Delás MJ & Briscoe J Repressive interactions in gene regulatory networks: When you have no other choice. Curr Top Dev Biol 139, 239–266 (2020). [DOI] [PubMed] [Google Scholar]
- 21.Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H & Macklis JD Molecular logic of neocortical projection neuron specification, development and diversity. Nat Rev Neurosci 14, 755–769 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Knoth R et al. Murine Features of Neurogenesis in the Human Hippocampus across the Lifespan from 0 to 100 Years. PLoS One 5, e8809 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Mu L et al. SoxC Transcription Factors Are Required for Neuronal Differentiation in Adult Hippocampal Neurogenesis. J Neurosci 32, 3067–3080 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rotheneichner P et al. Cellular Plasticity in the Adult Murine Piriform Cortex: Continuous Maturation of Dormant Precursors Into Excitatory Neurons. Cerebral Cortex 28, 2610–2621 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhang M et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature 624, 343–354 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Qu R et al. Gene trajectory inference for single-cell data by optimal transport metrics. Nat Biotechnol 1–11 (2024) doi: 10.1038/s41587-024-02186-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Habib N et al. Div-Seq: Single nucleus RNA-Seq reveals dynamics of rare adult newborn neurons. Science 353, 925–928 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Duan Y et al. Semaphorin 5A inhibits synaptogenesis in early postnatal- and adult-born hippocampal dentate granule cells. eLife 3, e04390 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Buckwalter MS et al. Chronically Increased Transforming Growth Factor-β1 Strongly Inhibits Hippocampal Neurogenesis in Aged Mice. The American Journal of Pathology 169, 154–164 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kastriti ME et al. Ablation of CNTN2+ Pyramidal Neurons During Development Results in Defects in Neocortical Size and Axonal Tract Formation. Front. Cell. Neurosci. 13, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Rasetto NB et al. Transcriptional dynamics orchestrating the development and integration of neurons born in the adult hippocampus. bioRxiv 2023.11.03.565477 (2024) doi: 10.1101/2023.11.03.565477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kerloch T et al. The atypical Rho GTPase Rnd2 is critical for dentate granule neuron development and anxiety-like behavior during adult but not neonatal neurogenesis. Mol Psychiatry 26, 7280–7295 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Chatzi C, Zhang Y, Shen R, Westbrook GL & Goodman RH Transcriptional Profiling of Newly Generated Dentate Granule Cells Using TU Tagging Reveals Pattern Shifts in Gene Expression during Circuit Integration. eNeuro 3, ENEURO.0024-16.2016 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Gómez-Climent MÁ et al. A Population of Prenatally Generated Cells in the Rat Paleocortex Maintains an Immature Neuronal Phenotype into Adulthood. Cereb Cortex 18, 2229–2240 (2008). [DOI] [PubMed] [Google Scholar]
- 35.La Rosa C et al. Phylogenetic variation in cortical layer II immature neuron reservoir of mammals. eLife 9, e55456 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Miller RA et al. Mouse (Mus musculus) stocks derived from tropical islands: new models for genetic analysis of life-history traits. Journal of Zoology 250, 95–104 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Zilkha N et al. Sex-dependent control of pheromones on social organization within groups of wild house mice. Current Biology 33, 1407–1420.e4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Tran QH et al. Unbalanced CO-optimal Transport. Proceedings of the AAAI Conference on Artificial Intelligence 37, 10006–10016 (2023). [Google Scholar]
- 39.Korsunsky I et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lopez R, Regier J, Cole MB, Jordan MI & Yosef N Deep Generative Modeling for Single-cell Transcriptomics. Nat Methods 15, 1053–1058 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Siletti K et al. Transcriptomic diversity of cell types across the adult human brain. Science 382, eadd7046 (2023). [DOI] [PubMed] [Google Scholar]
- 42.Woych J et al. Cell-type profiling in salamanders identifies innovations in vertebrate forebrain evolution. Science 377, eabp9186 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hain D et al. Molecular diversity and evolution of neuron types in the amniote brain. Science 377, eabp8202 (2022). [DOI] [PubMed] [Google Scholar]
- 44.Norimoto H et al. A claustrum in reptiles and its role in slow-wave sleep. Nature 578, 413–418 (2020). [DOI] [PubMed] [Google Scholar]
- 45.Tosches MA et al. Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science 360, 881–888 (2018). [DOI] [PubMed] [Google Scholar]
- 46.Ulinski PS & Rainey WT Intrinsic organization of snake lateral cortex. Journal of Morphology 165, 85–116 (1980). [DOI] [PubMed] [Google Scholar]
- 47.Lust K et al. Single-cell analyses of axolotl telencephalon organization, neurogenesis, and regeneration. Science 377, eabp9262 (2022). [DOI] [PubMed] [Google Scholar]
- 48.Bonfanti L, La Rosa C, Ghibaudi M & Sherwood CC Adult neurogenesis and ‘immature’ neurons in mammals: an evolutionary trade-off in plasticity? Brain Struct Funct (2023) doi: 10.1007/s00429-023-02717-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gerhart J & Kirschner M The theory of facilitated variation. Proceedings of the National Academy of Sciences 104, 8582–8589 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Wittkopp PJ & Kalay G Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet 13, 59–69 (2012). [DOI] [PubMed] [Google Scholar]
- 51.Dugas-Ford J, Rowell JJ & Ragsdale CW Cell-type homologies and the origins of the neocortex. Proc Natl Acad Sci U S A 109, 16974–16979 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tosches MA & Laurent G Evolution of neuronal identity in the cerebral cortex. Curr. Opin. Neurobiol. 56, 199–208 (2019). [DOI] [PubMed] [Google Scholar]
- 53.Kaslin J, Ganz J & Brand M Proliferation, neurogenesis and regeneration in the non-mammalian vertebrate brain. Philos Trans R Soc Lond B Biol Sci 363, 101–122 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Tosches MA From Cell Types to an Integrated Understanding of Brain Evolution: The Case of the Cerebral Cortex. Annu Rev Cell Dev Biol 37, 495–517 (2021). [DOI] [PubMed] [Google Scholar]
- 55.Luzzati F A hypothesis for the evolution of the upper layers of the neocortex through co-option of the olfactory cortex developmental program. Front Neurosci 9, (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Striedter GF & Northcutt RG The Independent Evolution of Dorsal Pallia in Multiple Vertebrate Lineages. Brain Behavior and Evolution 96, 200–211 (2021). [DOI] [PubMed] [Google Scholar]
Methods References
- 57.Zeppilli S et al. Molecular characterization of projection neuron subtypes in the mouse olfactory bulb. eLife 10, e65445 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wolf FA, Angerer P & Theis FJ SCANPY: large-scale single-cell gene expression data analysis. Genome Biology 19, 15 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wolock SL, Lopez R & Klein AM Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Systems 8, 281–291.e9 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lun L, Bach AT, K. & Marioni JC Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biology 17, 75 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Traag VA, Waltman L & van Eck NJ From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9, 5233 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Wolf FA et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biology 20, 59 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhang Y et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biology 9, R137 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Yao Z et al. A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain. Nature 624, 317–332 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Wang Q et al. The Allen Mouse Brain Common Coordinate Framework: A 3D Reference Atlas. Cell 181, 936–953.e20 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Pedregosa F et al. Scikit-learn: Machine Learning in Python. Preprint at 10.48550/arXiv.1201.0490 (2018). [DOI] [Google Scholar]
- 67.Cao K, Hong Y & Wan L Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona. Bioinformatics 38, 211–219 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Demetci P, Tran QH, Redko I & Singh R Jointly aligning cells and genomic features of single-cell multi-omics data with co-optimal transport. 2022.11.09.515883 Preprint at 10.1101/2022.11.09.515883 (2022). [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Raw and processed single nucleus multiome and RNA sequencing data have been deposited and available in Gene Expression Omnibus (GEO) under the accession number GSE239477. Seurat and Scanpy integrated objects are available upon request. All other data are included in the main paper or the supplementary materials.
The R and Python analysis scripts used for this paper are available at the GitLab link https://gitlab.com/fleischmann-lab/papers/zeppilli-et-al-2023.
