Classifying Drosophila Olfactory Projection Neuron Subtypes by Single-cell RNA Sequencing

Hongjie Li; Felix Horns; Bing Wu; Qijing Xie; Jiefu Li; Tongchao Li; David J Luginbuhl; Stephen R Quake; Liqun Luo

doi:10.1016/j.cell.2017.10.019

. Author manuscript; available in PMC: 2018 Nov 16.

Published in final edited form as: Cell. 2017 Nov 16;171(5):1206–1220.e22. doi: 10.1016/j.cell.2017.10.019

Classifying Drosophila Olfactory Projection Neuron Subtypes by Single-cell RNA Sequencing

Hongjie Li ^1,^#, Felix Horns ^2,^#, Bing Wu ¹, Qijing Xie ^1,³, Jiefu Li ¹, Tongchao Li ¹, David J Luginbuhl ¹, Stephen R Quake ^4,^5,⁶, Liqun Luo ^1,^6,⁷

PMCID: PMC6095479 NIHMSID: NIHMS982283 PMID: 29149607

Summary

The definition of neuronal type and how this relates to the transcriptome are open questions. Drosophila olfactory projection neurons (PNs) are among the best-characterized neuronal types: different PN classes target dendrites to distinct olfactory glomeruli, while PNs of the same class exhibit indistinguishable anatomical and physiological properties. Using single-cell RNA-sequencing, we comprehensively characterized the transcriptomes of most PN classes and unequivocally mapped transcriptomes to specific olfactory function for 6 classes. Transcriptomes of closely related PN classes exhibit the largest differences during circuit assembly but become indistinguishable in adults, suggesting that neuronal subtype diversity peaks during development. Transcription factors and cell-surface molecules are the most differentially expressed genes between classes and are highly informative in encoding cell identity, enabling us to identify a new lineage-specific transcription factor that instructs PN dendrite targeting. These findings establish that neuronal transcriptomic identity corresponds with anatomical and physiological identity defined by connectivity and function.

Introduction

The nervous system comprises many neuronal types with varied locations, input and output connections, neurotransmitters, intrinsic properties, and physiological and behavioral functions. Recent transcriptome analyses, especially from single cells, have provided important criteria to define a cell type. Indeed, single-cell RNA-sequencing (RNA-seq) has been used to classify neurons in various parts of the mammalian nervous system (e.g., Darmanis et al., 2015; Johnson et al., 2015; Usoskin et al., 2015; Zeisel et al., 2015; Foldy et al., 2016; Fuzik et al., 2016; Gokce et al., 2016; Shekhar et al., 2016; Tasic et al., 2016), but the extent to which it is useful to define subtypes of neurons and the relationship between cell type and connectivity is unclear in most cases. Indeed, what constitutes a neuronal type in many parts of the nervous system remains an open question (Johnson and Walsh, 2017).

The Drosophila olfactory circuit offers an excellent system to investigate the relationship between transcriptomes and neuronal cell types. 50 classes of olfactory receptor neurons (ORNs) form one-to-one connections with 50 classes of second-order projection neurons (PNs) in the antennal lobe in discrete glomeruli, forming 50 parallel information processing channels (Figure 1A; Vosshall and Stocker, 2007; Wilson, 2013). Each ORN class is defined by expression of 1–2 unique olfactory receptor gene(s) and by the glomerulus to which their axons converge. Correspondingly, each PN class is also defined by the glomerulus within which their dendrites elaborate, which correlates strongly with the axonal arborization patterns at a higher olfactory center (Marin et al., 2002; Jefferis et al., 2007). Furthermore, while on average ~60 ORNs and ~3 PNs form many hundreds of synapses within a single glomerulus (Mosca and Luo, 2014), every ORN forms synapses with every PN to convey the same type of olfactory information (Kazama and Wilson, 2009; Tobin et al., 2017). Indeed, PNs that project to the same glomerulus exhibit indistinguishable electrophysiological properties and olfactory responses (Kazama and Wilson, 2009). Thus, one can define each PN class as a specific neuronal type (or subtype, if all PNs are collectively considered a cell type) with confidence that each class has unique connectivity, physiological properties, and function, whereas PNs of the same class most likely do not differ. In other words, the ground truth of cell types for fly PNs is one of the best defined in the nervous system. We describe here a robust single-cell RNA-seq protocol for neurons and glia in the Drosophila brain, and its application to Drosophila PN to establish the relationship between transcriptome, neuronal cell identity, and development.

Figure 1. — (A) Schematic of fly olfactory system organization. Olfactory receptor neurons (ORNs) expressing the same odorant receptor (same color) target their axons to the same glomerulus in the antennal lobe. Projection neuron (PN) dendrites also target single glomeruli, and their axons project to the mushroom body (MB) and lateral horn (LH).

(B) Schematic of single-cell RNA-seq protocol.

(C) Representative confocal images of *Drosophila* central brains labeled by *UAS-mCD8GFP* crossed with PN driver *GH146-GAL4* (24h APF) or astrocyte driver *alrm-GAL4* (72h APF). N-cadherin (Ncad, red) staining labels neuropil. Scale, 50 μm.

(D) Heat map showing expression levels of genes that are specific for neurons or astrocytes. Each column is an individual cell. 67 *alrm-GAL4*+ and 946 *GH146-GAL4*+ cells are shown, with driver indicated by the color above. Cell type-specific genes are enriched in astrocytes (top 9) and PNs (bottom 5). Expression levels are indicated by the color bar (CPM, counts per million). Cells and genes were ordered using hierarchical clustering.

(E) Visualization of astrocyte and PN populations using t-distributed Stochastic Neighbor Embedding (tSNE). Each dot is a cell.See also Figure S1.

Results

A Robust Single-cell RNA-seq Protocol for the Drosophila Pupal Brain

Brains containing cells labeled by mCD8GFP driven from specific GAL4 lines were manually dissected, single-cell suspensions were prepared following a method modified after Tan et al. (2015), and cDNA were sequenced using a modified SMART-seq2 protocol (Picelli et al., 2014) (Figure 1B; Figure S1A; STAR Methods). We sequenced cells from Drosophila pupal brains that were labeled by the astrocyte driver alrm-GAL4 (Doherty et al., 2009) and olfactory projection neurons (PNs) labeled by the GH146-GAL4 driver, which is expressed in 40 of 50 PN classes (Stocker et al., 1997; Jefferis et al., 2001) (Figure 1C). About 5% of GFP-labeled cells within the brain were recovered as single cells, and 90% of PNs yielded high-quality cDNA after reverse transcription (Figure S1B, C). Cells were sequenced to a depth of ~1 million reads per cell and 1000–4000 genes were detected per cell (Figure S1D). Data quality was evaluated by examining expression of 5 neuronal markers (brp, nSyb, elav, Syt1, and CadN) and 4 astrocyte markers (alrm, Eaat1, Gat, and Gs2) (Doherty et al., 2009; Sinakevitch et al., 2010; Stork et al., 2014); they were specifically expressed in the corresponding cell types (Figure 1D). We also identified 5 new genes (Msr-110, tre1, Cyp4g15, mfas, Obp44a) that were expressed in pupal astrocytes but not in PNs (Figure 1D). Unbiased clustering based on transcriptome profiles readily distinguished PNs and astrocytes (Figure 1E). Among PNs, housekeeping genes (e.g., Act5C and α-Tub84B) were reliably detected in all cells, and stress-related genes (e.g., Hsp70 family genes) were not widely induced (Figure S1E). ~50% of cells co-expressed two male-specific RNAs (Meller et al., 1997) (Figure S1F), as expected given that we did not discriminate sex. These data demonstrate the reliability of our single-cell RNA-seq protocol for analyzing cell types and transcriptomes in Drosophila pupal brain.

Clustering GH146-GAL4+ Projection Neurons (PNs) Based on Single-cell Transcriptomes

GH146-GAL4+ (GH146+ hereafter) PNs are derived from three neuroblast lineages whose cell bodies are located anterodorsal, lateral, or ventral to the antennal lobe neuropil (Figure 2A; Jefferis et al., 2001). The anterodorsal and lateral lineages produce uniglomerular, cholinergic, and excitatory PNs (adPNs and lPNs), whereas the ventral lineage produces GABAergic inhibitory PNs (vPNs), some of which target dendrites to multiple glomeruli (Jefferis et al., 2001; Liang et al., 2013). We sequenced 1046 single GH146+ cells at 24–30 hours after puparium formation (h APF). At this stage, PNs are refining their dendrite targeting in the antennal lobe; these dendrites also serve as targets for ORN axons that will invade the antennal lobe and establish one-to-one connections in the following 24 hours (Jefferis et al., 2004). We analyzed 946 cells that passed quality filtering (STAR Methods).

Figure 2. — (A) Representative confocal projection and schematic of *GH146*+ PNs, which include (per antennal lobe) 50 adPNs (*acj6*+), 35 lPNs (*vvl* expression begins to decrease from 18h APF; Komiyama et al., 2003), and 6 vPNs (*Lim1*+). The cell bodies of adPNs, lPNs, and vPNs are located respectively anterodorsal, lateral, and ventral to the antennal lobe neuropil (circled; stained by Ncad). All *GH146*+ adPNs and lPNs send dendrites to a single glomerulus. The schematic shows the stereotyped locations of a large subset of glomeruli (named according to their locations; Laissue et al., 1999), color-coded according to adPNs or lPNs. Scale, 20 μm. D, dorsal; L, lateral.

(B) Visualization of *GH146*+ PNs using dimensionality reduction by PCA followed by tSNE. Each dot is a cell. Cells are arranged according to transcriptome similarity.

(C) Schematic of Iterative Clustering for Identifying Markers (ICIM), an unsupervised machine-learning algorithm for identifying genes that distinguish cell types.

(D) Visualization of *GH146*+ PNs using tSNE based on 561 genes identified using ICIM. *GH146*+ adPNs and lPNs form 30 distinct clusters (differentially colored). Black dots are cells that could not be assigned to any cluster.

(E) Visualization of *GH146*+ PNs as in Figure 2D, colored according to *acj6* and *vvl* expression level. *acj6* and *vvl* are expressed in *GH146*+ PNs in a mutually exclusive manner.

See also Figure S2.

Conventional dimensionality reduction and clustering methods based on Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (tSNE) (van der Maaten and Hinton, 2008) identified only ~12 distinct PN clusters (Figure 2B). The inability to resolve more distinct clusters is likely due to the limited sensitivity of these methods to distinguish cell types with highly similar transcriptomes, as we expect for the PN classes. To address this challenge, we developed an unsupervised machine-learning algorithm, Iterative Clustering for Identifying Markers (ICIM), to identify genes that distinguish PN classes. ICIM searches for genes having the highest expression variability within a cell population, partitions the cells into two subpopulations using clustering based on these genes, then iteratively repeats the search on each subpopulation. Iteration continues until distinct subpopulations cannot be separated because gene expression patterns within the population are homogeneous (Figure 2C). Stopping criteria are defined in an unbiased manner without supervision. Genes identified using ICIM were then used for further dimensionality reduction using tSNE and clustering using HDBSCAN, a hierarchical density-based clustering algorithm (Campello et al., 2013), on the tSNE space. Applying ICIM to the transcriptomes of GH146+ PNs, we identified 561 genes that segregate the 946 GH146+ cells into 35 distinct clusters (Figure S2A, B).

Two of the 35 clusters expressed known markers for vPNs (Figure S2A): Gad1+, a GABA biosynthetic enzyme, and Lim1+, a transcription factor expressed in vPNs but not adPNs or lPNs (Komiyama and Luo, 2007), suggesting that they correspond to vPNs. Besides PNs, the only other cells that GH146-GAL4 also consistently labeled at a high level at 24h APF were the anterior paired lateral (APL) neurons (Figure S2C). Three other clusters expressed VGlut (Figure S2B), which specifically marked GH146+ APL neurons but not PNs (Figure S2D), suggesting that this VGlut+ population consists of APL neurons. Because we were interested primarily in excitatory adPNs and lPNs, we removed inhibitory vPNs and APL neurons from subsequent analysis. Nearly all of the remaining 902 GH146+ cells should be adPNs and lPNs, which collectively target 40 glomeruli. Clustering analysis using ICIM and tSNE identified 30 distinct clusters (Figure 2D). Library complexity and sequencing depth did not drive clustering (Figure S2E). The number of cells belonging to each cluster varied from 5 to 108, likely reflecting the fact that different PN classes contain different cell numbers ranging from 1 to 7 cells per antennal lobe (Yu et al., 2010; Lin et al., 2012). It is likely that we did not sample a sufficient number of cells to detect rare PN classes.

We previously showed that two transcription factors, Abnormal chemosensory jump 6 (Acj6) and Ventral veins lacking (Vvl; also known as Drifter), are expressed in adPNs and lPNs, respectively (Figure 2A), and instruct lineage-specific dendrite targeting (Komiyama et al., 2003). Indeed, our single-cell RNA-seq analysis revealed that acj6 and vvl were expressed in a mutually exclusive manner (Figure 2E and S2F). Among the 30 clusters of adPNs and lPNs, 18 clusters (60%) expressed acj6 but not vvl, and thus represent adPNs. The remaining 12 clusters were likely lPNs.

In summary, single-cell RNA-seq analysis revealed distinct clusters of GH146+ adPNs and lPNs expressing lineage markers in a manner consistent with prior knowledge. Transcription factor transcripts, whose protein levels are generally low (Ghaemmaghami et al., 2003), were reliably detected within PNs and could be used to assign lineage identity, supporting the specificity and sensitivity of this method.

Matching Clusters to PN Classes Using Known Markers

We next attempted to map the correspondence between transcriptome-based PN clusters and glomerular-target-based PN classes by leveraging drivers that label specific PN classes. We found that 91G04-GAL4 (Jenett et al., 2012) was robustly expressed in PNs at 24h APF. To limit expression only to PNs, we utilized an intersectional strategy by combining 91G04-GAL4 with GH146-Flp (Potter et al., 2010) and UAS-FRT-STOP-FRT-mCD8GFP (Hong et al., 2009), such that only cells that express both 91G04-GAL4 and GH146-Flp would express mCD8GFP (hereafter referred to as “intersecting with GH146-Flp”). This resulted in expression of mCD8GFP in just two adPNs per hemisphere, both of which project dendrites to the DC2 glomerulus (Figure 3A). We sequenced 23 91G04+ PNs at 24–30h APF and performed clustering analysis using ICIM and tSNE together with the GH146+ cells. We found that all 91G04+ PNs mapped to one GH146+ cluster (Figure 3C; Cluster #1). All 91G04+ cells could also be unambiguously mapped to this GH146+ cluster using a random forest classifier (data not shown). Thus, Cluster #1 corresponds to DC2 PNs.

Mz19-GAL4 is expressed from 24h APF to adulthood (Figure 3B; Jefferis et al., 2004). After intersecting with GH146-Flp, Mz19-GAL4 labels three PN classes: adPNs that project to VA1d and DC3 (acj6+), and lPNs that project to DA1 (acj6−). We sequenced 123 Mz19+ cells at 24–30h APF, and mapped them to four clusters of GH146+ cells (Figure 3C). The Mz19+/acj6− cells, corresponding to DA1 PNs, mapped to two clusters (#2 and #2’), suggesting that both correspond to DA1 PNs, a notion that we explore further below. The Mz19+/acj6+ cells, corresponding to VA1d and DC3 PNs, mapped to two clusters of GH146+ cells (#3 and #4; Figure 3C). Thus, Clusters #3 and #4 correspond to VA1d and DC3 PNs. We establish a one-to-one correspondence between these clusters and PN classes below.

Matching Clusters to PN Classes Using Newly Identified Markers

To map additional PN transcriptome clusters to glomerular classes, we searched the single-cell transcriptome data for new markers. We identified terribly reduced optic lobes (trol) as predominantly expressed in a single cluster (Figure 4A). Intersecting an existing trol-GAL4 (NP5103-GAL4, inserted into an intron of trol) with GH146-Flp labeled 2–3 adPNs at both 24h and 72h APF, with dendrites projecting to the VM2 glomerulus at 72 h APF (Figure 4B). 28 sequenced trol-GAL4+ PNs (after intersecting with GH146-Flp) mapped to the original trol+ Cluster #5 (Figure 4C). These data indicate that trol-GAL4 mimics endogenous trol expression and that Cluster #5 corresponds to VM2 PNs.

Using Mz19-GAL4, we mapped Mz19+/acj6+ VA1d and DC3 PNs to Clusters #3 and #4 (Figure 3C), but could not resolve which cluster belonged to which PN class. VA1d and DC3 PNs are closely related: VA1d PNs are born immediately after DC3 PNs from the same lineage and target dendrites to neighboring glomeruli. To establish a one-to-one mapping, we identified CG31676 to have the strongest differential expression between the two clusters (Figure 4D; compare Clusters #3 and #4). We generated CG31676-GAL4 by inserting into the first intron of CG31676 a cassette containing a splice acceptor (SA) sequence followed by T2A peptide sequence and the GAL4 coding sequence (Figure 4E). After intersecting with GH146-Flp, CG31676-GAL4 labeled a similar number of PNs at 24h, 48h, and 72h APF, which targeted dendrites to VA1d but not DC3 (Figure 4F). Thus, Mz19+/acj6+/CG31676+ Cluster #3 corresponds to VA1d PNs; the Mz19+/acj6+/CG31676− Cluster #4 corresponds to DC3 PNs.

In addition to DA1 (#2 and #2’) and VA1d (#3), CG31676-GAL4 also strongly labeled DL3, which is targeted by acj6− lPNs (Figure 4F and S3A) (Jefferis et al., 2001). Among the 30 clusters, only two clusters (#6 and #6’) were CG31676+/acj6− (Figure 4D and S3B); these two clusters displayed highly similar transcriptomes as reflected in their close proximity in the tSNE plot. We therefore mapped Clusters #6 and #6’ to DL3 PNs. CG31676-GAL4 transiently labeled two other glomeruli targeted by acj6+ adPNs (Figure S3A), but we could not unambiguously assign them to corresponding clusters.

Among the 6 glomerular classes we have mapped, four corresponded to a single transcriptome cluster each, but DA1 and DL3 PNs each corresponded to two clusters (Figure 4D, 4G). All PN classes are born in a stereotyped order within a specific lineage, and most PN classes are born consecutively within a single time window (Jefferis et al., 2001; Yu et al., 2010; Lin et al., 2012). DA1 and DL3 PNs are the only two exceptions: they are born in two time windows separated by more than 24 and 12 hours, respectively (Figure S3D; Lin et al., 2012). This birth timing difference may contribute to the transcriptome heterogeneity of DA1 and DL3 PNs. For DA1 PNs, we found that fruitless (fru), encoding a transcription factor and a key regulator of male sexual behavior (Dickson, 2008), was expressed only in the large cluster (#2). This is consistent with a previous finding that NP21-GAL4 (inserted into a fru intron near the sexually dimorphic splicing site) only labels DA1 PNs after intersecting with GH146-Flp (Potter et al., 2010). On the other hand, CG45263 was only expressed in the small cluster (#2’) (Figure S3C). We also identified genes that were expressed only in one of the two DL3 clusters (Figure S3B). It remains to be determined whether the transcriptional differences between Clusters #2 and #2’ and between Clusters #6 and #6’ reflect only differences in birth timing, or potential differences in biological functions.

In summary, by using a combination of existing markers and new markers discovered using single-cell RNA-seq, we have unambiguously mapped 6 PN classes to corresponding transcriptome clusters (Figure 4G). Our results indicate that the combination of genetic drivers and single-cell RNA-seq offers a simple strategy for mapping transcriptome clusters to cell types.

A New Lineage-specific Transcription Factor Regulates Dendrite Targeting

Our single-cell transcriptome analysis identified many transcription factors (TFs) that were differentially expressed in separate clusters. For example, prospero mRNA was expressed in a majority of PNs including all Mz19+ PNs, whereas cut mRNA was expressed in a few PNs, all of which were Mz19− (Figure S4A). Indeed, antibody staining validated these observations (Figure S4B), and the expression of Cut is consistent with our previous finding (Komiyama and Luo, 2007).

Our analysis also identified new lineage-specific expression for several TFs. Specifically, C15 and knot mRNAs were observed only in adPNs and unplugged (unpg) was observed only in lPNs (Figure 5A). We confirmed these results by immunostaining using antibodies against C15 and Knot, and a lacZ reporter for unpg (Figure 5B). knot plays a critical role in controlling dendrite development of Drosophila sensory neurons (Jinushi-Nakao et al., 2007), and unpg is a marker for specific neuroblast sublineages in the Drosophila embryonic ventral nerve cord (Cui and Doe, 1995). C15, encoding a homeobox-containing protein, is a homolog of human Hox11 critical in regulating a gene network in the developing Drosophila leg (Campbell, 2005), but its neural function is unknown. We tested whether C15 plays a role in PN dendrite targeting.

Figure 5. — (A) Visualization of *GH146*+ PNs using tSNE as in Figure 2E showing expression of *acj6*, *C15*, *knot*, and *unpg*. adPNs are outlined (based on *acj6* expression) and remaining cells are lPNs.

(B) Consistent with RNA-seq data in (A), 24h APF expression of C15 and Knot (antibody staining) in *GH146*+ PNs (green) is restricted to adPNs, while *unpg* (anti-β-gal staining) is restricted to lPNs.

(C) Loss-of-function analysis of *C15* using *elav-GAL4* driven *UAS-C15-RNAi* (line #2; see Figure S4C). Wild type (WT) control: *elav-GAL4* × w¹¹¹⁸. When *C15* is knocked down, the VA1d glomerulus (visualized by VA1d ORN axons labeled by *Or88a-mtdT*) displays a dorsal shift. In addition, GFP signal in VA1d PN dendrites (visualized by *Mz19-QF* driven *QUAS-mCD8GFP*) is undetectable.

(D) Quantification of position shift of the VA1d glomerulus due to *C15* knockdown in (C). θ is the angle between the dorsoventral axis and a line drawn through the centers of the VA1d and DC3 glomeruli. Error bars are SEM. ***, P < 0.001 (t test).

(E) Gain-of-function analysis of *C15* in *Mz19-GAL4*+ MARCM misexpression clones. In WT, dendrites of adPN neuroblast (adNB) clones target VA1d and DC3 and lPN neuroblast (lNB) clones target DA1. When *C15* is misexpressed, dendrite targeting of adNB clones is not affected, while dendrite targeting of lNB clones is affected with 100% penetrance. Ncad is used as a neuropil marker. Scale, 20 μm.See also Figure S4.

In a loss-of-function experiment, we used elav-GAL4 to knockdown C15 in all neurons, Mz19-QF-driven QUAS-mCD8GFP to monitor dendrite targeting of VA1d and DA1 PNs [Mz19-QF labels DA1 and VA1d, but not DC3 PNs in wild type (Hong et al., 2012)], and Or88a promoter driven myristolated tdTomato (Or88a-mtdT) to monitor axon targeting of VA1d ORNs (Ward et al., 2015). Pan-neuronal knockdown of C15 using a strong RNAi line (Figure S4C) caused a highly penetrant dorsal shift of the VA1d glomerulus without affecting DA1 dendrite targeting (Figure 5C, 5D, and S4D), concomitant with a loss of dendrites in the VA1d glomerulus. This loss could be because: (1) C15 controls the expression of Mz19-QF in VA1d PNs, (2) VA1d neurons die or are not born, or (3) VA1d dendrites mistarget to the DA1 glomerulus.

In a gain-of-function experiment, we used the Mz19-GAL4-based MARCM system to misexpress C15. Control Mz19+ adPNs target to the VA1d and DC3 glomeruli and lPNs target to the DA1 glomerulus (Figure 5E, left panels; Figure 3B). However, when C15 was misexpressed, Mz19+ lPNs (DA1 PNs only) sent dendrites to regions outside the DA1 glomerulus, including VA1d, DC3, DA3, and DA4l that are all normally targeted by adPNs, while Mz19+ adPNs targeted dendrites correctly (Figure 5E, right panels; S4E and S4F). These data suggest that the transcription factor C15, as with Acj6 and Vvl (Komiyama et al., 2003), instructs lineage-specific PN dendrite targeting.

Transcriptomes of Closely Related PN Classes Exhibit the Largest Differences during Circuit Assembly

How do neuronal transcriptomes change as development proceeds? By mapping clusters from single-cell RNA-seq data to specific PN classes at different developmental stages, we can address this key question at the resolution of single PN classes. We focused on the three classes of Mz19+ adPNs and lPNs, which have been unequivocally mapped to specific transcriptome clusters (Figure 4G).

Following the coarse patterning of PN dendrites at 24h APF, ORN axons invade the antennal lobe beginning ~30h APF to identify their PN partners, until they match with cognate PNs and establish discrete glomerular compartments first visible ~48h APF (Jefferis et al., 2004). Following further expansion of terminal branches of ORN axons, PN dendrites, and synaptogenesis, pupae become adults at ~100h APF. Using the intersection of Mz19-GAL4 and GH146-Flp, we sequenced and analyzed 485 single cells from five time points (~100 cells each): 24–30h, 36–42h, 48–54h, 72–78h APF, and 1–2d adult (Figure 6A).

Clustering analysis using ICIM and tSNE revealed that Mz19+/acj6+ (VA1d and DC3) and Mz19+/acj6− (DA1) PNs were clearly separable at all times (Figure 6B). Interestingly, VA1d and DC3 PNs formed distinct clusters at the four pupal stages, but merged into a single cluster in the adult (Figure 6B and Figure 6C). To confirm this observation quantitatively, we calculated cell type identity scores using the 22 most differentially expressed genes (P < 10⁻⁵) between VA1d and DC3 PNs across all pupal stages, and found that the difference between the transcriptional states of these two PN classes was maintained from 24h to 48h APF, but began to shrink at 72h APF, and were indistinguishable in the adult (Figure 6D). Using an alternative, unbiased genome-wide method, we calculated the Pearson correlation between the expression profiles of all pairs of cells based on 497 genes identified by ICIM. This analysis confirmed that transcriptome differences between VA1d and DC3 PNs disappeared in the adult (Figure 6E). Indeed, clustering analyses using only adult VA1d and DC3 PNs failed to find distinct populations (data not shown). Collectively, these data indicate that VA1d and DC3 PNs exhibit peak transcriptome differences during early pupal stages (24–48h APF) when PNs are refining their dendrite targeting and presenting cues for ORN axon targeting. These differences progressively diminish in late pupal and adult stages (Figure 6F).

These observations suggest that PN subtype identity genes, which distinguish VA1d and DC3 during the wiring stages, are down-regulated once wiring specificity is established. To test this, we systematically identified differentially expressed genes at different stages in all Mz19+ PNs. Gene ontology (GO) analysis indeed revealed that down-regulated genes consisted of factors associated with development and differentiation, whereas most up-regulated genes were associated with metabolic processes (Figure S5A). Clustering of genes based on their dynamic expression pattern revealed transcriptional waves consisting of genes that are coordinately turned down or up at different developmental stages (Figure S5B). Notably, many more TFs and cell-surface and secreted molecules (CSMs) were down- than up-regulated (Figure S5B); CSMs were drawn from a database curated for relevancy to cell recognition and wiring specificity but excluding ion channels, transporters, and secreted enzymes (Kurusu et al., 2008). CG31676, which was expressed in VA1d but not DC3 PNs at 24h APF (Figure 4D, G), was turned off in both PN classes in the adult while its expression in DA1 persisted (Figure S5C); this was validated with CG31676-GAL4 expression analysis across developmental stages (Figure S5D).

Next we asked if transcriptomes of PN classes from the same neuroblast lineage are more similar than those from different lineages. We found that the transcriptome differences between VA1d and DC3 PNs (both adPNs) were consistently smaller than that between VA1d and DA1 PNs (adPNs and lPNs, respectively) across developmental stages (Figure 6G). Similarly, the transcriptome differences at 24h APF between DA1 and DL3 lPNs were similar to those between VA1d and DC3 but smaller than those between VA1d and DA1 PNs. All four PN classes target to adjacent glomeruli in the dorsolateral antennal lobe (Figure 6A, right). Thus, lineage origin correlates more to transcriptome similarities than does dendrite targeting position, highlighting the important contribution of cellular ancestry to transcriptome state.

PN Subtype Identity Is Encoded by a Combinatorial Molecular Code

How is cell type identity encoded in the transcriptome? It is possible that: (1) each cell type expresses at least one unique gene, or (2) each cell type expresses a unique subset of a shared pool of genes. The strategy used for encoding cell type identity in the nervous system remains an unresolved question. To comprehensively address how neuronal subtype identity is encoded in 24h APF pupal PNs, we approximated the 30 GH146+ transcriptome clusters as 30 subtypes, and searched for marker genes that were uniquely expressed in a single subtype. We designed two criteria: 1) the gene must be robustly expressed within a cluster [>7 counts per million (CPM), or log₂(CPM+1) > 3, in >50% of the cells of a cluster]; 2) the gene must not be expressed in any other cluster (>7 CPM in <10% of the cells of any other cluster). Only 6 genes fulfilled these criteria (Figure S6A), sufficient to encode 5 of the 30 clusters. With relaxed criteria, we quickly entered a regime where identified genes were expressed in multiple clusters and hence not unique (Figure S6B). The inability to detect unique markers in most cell types was not due to transcript dropouts (Figure S6C). Thus, with few exceptions, GH146+ PNs lack marker genes that uniquely encode subtypes.

Next, we sought to identify combinatorial molecular codes for cell type identity. We searched for a minimal set of genes that could uniquely encode PN subtypes using an information theoretic approach. We calculated the information content of each gene with respect to PN subtype identity, formally defined as the mutual information between the binarized expression state of the gene (ON/OFF) and PN cluster identity (STAR Methods). We ranked genes by their information content, and then selected a minimal set of genes by greedy search, iteratively drafting the gene carrying the most non-redundant information about identity into the set until 95% of the uncertainty of subtype identity was explained. The result of this search is a set of genes for which knowledge of their expression states (ON/OFF) alone is sufficient to classify subtype identity with high accuracy. We first applied this strategy to the three Mz19+ PN classes. Only two genes, C15 and CG31676, were sufficient to distinguish these three subtypes (Figure 7A), explaining 92% of the uncertainty of classification of individual Mz19+ PN cells into subtypes. Both C15 and CG31676 were independently identified and characterized earlier in our study (Figure 4D and Figure 5). This finding demonstrates that this approach can identify gene sets that robustly encode cell type identity in a combinatorial manner.

Figure 7. — (A) Minimal combinatorial code for subtype identity among *Mz19*+ PNs identified using an information theoretic approach. Left, mean expression level of each gene among cells belonging to each *Mz19*+ class. Right, binarized expression levels of the same genes [cutoff: log₂(CPM+1) = 3]. Each *Mz19*+ PN class expresses a distinct combination of these two genes.

(B) Information contained in minimal combinatorial codes for *GH146*+ subtype identity. X-axis is the number of genes included in the code. Y-axis is the amount of uncertainty (entropy) of cell type classification that is explained by the code. Colors denote codes constructed from different sets of genes. The genome-wide code (pink) is constructed from all genes, while the TF (green) or CSM (orange) codes use only 1045 TF or 955 CSM genes. Gray denotes codes constructed from 1,000 randomly sampled non-TF and non-CSM genes, with the line indicating the median and the shading indicating the standard deviation across 100 replicates, respectively.

(C–E) Minimal combinatorial codes for *GH146*+ subtype identity constructed from (C) all genes, (D) TFs, or (E) CSMs. Heat map indicates the binary expression state of genes in each cluster, as in (A). Clusters and genes are arranged by hierarchical clustering.

(F) Representation of TFs and CSMs among the top 30 differentially expressed (DE) genes between pairs of *Mz19*+ PN subtypes as indicated. Y-axis shows the fraction of the 30 most differentially expressed genes that are TFs (green), CSMs (orange), or TF + CSM (blue) at each developmental stage. Adult stage is absent from the VA1d vs DC3 comparison because their transcriptomes cannot be distinguished.

(G) Enrichment of TFs and CSMs among the top differentially expressed genes between pairs of clusters of *GH146*+ cells (435 pairs). X-axis shows the number of top differentially expressed genes under consideration. Y-axis shows the distribution of enrichment of either TFs or CSMs within these genes. Enrichment is calculated relative to the genomic representation of TFs (6.7%) and CSMs (6.2%), indicated by the horizontal line.See also Figures S6 and S7.

Applying this strategy to the 30 GH146+ PN subtypes, we identified 11 genes whose expression states uniquely identified every PN subtype (Figure 7C and Figure S7A). Knowledge of the expression states of these genes alone is sufficient to resolve 95% of the uncertainty in classification of individual GH146+ PN cells into subtypes (Figure 7B, pink line). Similar results were obtained using a range of different thresholds for binarization of expression state (Figure S7B, C), or when we examined combinatorial codes based on gene expression levels after discretization into four states (Off, Low, Medium, High) instead of binary states (ON/OFF) (data not shown). Indeed, a multinomial classifier using these 11 genes correctly classified 82% of individual GH146+ PN cells into subtypes despite measurement noise (Figure S7D). Together, these analyses indicate that GH146+ PN subtype identity can be distinguished using a combinatorial code composed of expression states of only 11 genes. This code is more compact than a code distinguishing each subtype using a unique marker (30 genes required), but substantially above the theoretical minimum of 5 genes (which can encode 2⁵ or 32 binary states).

TFs and CSMs Are Highly Informative in Encoding PN Identity and Enriched Among Differentially Expressed Genes

What types of genes distinguish neuronal subtypes? It is widely thought that transcription factors (TFs) establish and maintain cell type identity, while cell-surface and secreted molecules (CSMs) determine wiring specificity. But there has not been, to our knowledge, genome-wide analysis to show this in an unbiased manner. Strikingly, 8 of the 11 genes in the minimal combinatorial code identified by our information theoretic analysis above were TFs (Figure 7C), supporting a central role for TFs in specifying cell type identity. To further explore the roles of TFs and CSMs in class identity, we searched for minimal codes for cell type identity consisting only of TFs or CSMs using our information theoretic approach along with previously annotated lists of TFs (FlyTF database) and CSMs (Kurusu et al., 2008), each containing ~1000 genes. Minimal codes consisting of 13 TFs (Figure 7D and Figure S7A) or 12 CSMs (Figure 7E and Figure S7A) was sufficient to resolve 95% of the uncertainty in classifying GH146+ cells into PN subtypes (Figure 7B). That is, GH146+ PNs can be reliably classified into 30 subtypes based on the expression states of either 13 TFs alone or 12 CSMs alone. The compactness of these minimal codes was similar to that of the most compact code obtained in our genome-wide search (Figure 7C).

To evaluate whether TFs and CSMs are particularly informative with respect to subtype identity, we measured the amount of information contained within minimal combinatorial codes built from other genes (not TFs or CSMs) chosen at random from the genome (sampling 1,000 genes at random with 100 replicates). Randomly chosen genes carried significantly less information than TFs or CSMs (Figure 7B), despite having similar expression level distributions (Figure S7E). These findings indicate that, on average, TFs and CSMs carry more information about GH146+ subtype identity than other genes.

To test this idea further, we asked whether TFs and CSMs were enriched in differentially expressed genes among PN subtypes. Among Mz19+ adPNs and lPNs, TFs and CSMs accounted for a large proportion of differentially expressed genes (Figure 7F). Representation of CSMs peaked during the circuit assembly state (24–48h APF), consistent with a role for differential expression of CSMs in determining wiring specificity. We also analyzed differentially expressed genes separating every pair of 30 clusters, comprising 435 (30 × 29 / 2) pairs altogether. TFs and CSMs were highly enriched among differentially expressed genes, with the strongest enrichment found among the most significantly differentially expressed genes (Figure 7G). These findings support the notion that expression of TFs and CSMs plays key roles in determining PN subtype identity and wiring specificity.

Discussion

Single-cell RNA-seq has recently emerged as a powerful technique to investigate cellular heterogeneity, discover new cell types, and identify cell type-specific markers. We established a robust single-cell RNA-seq protocol for Drosophila neurons and glia. By focusing on olfactory projection neurons (PNs), among the best characterized cell types in all nervous systems, we established unequivocally that transcriptomic identity corresponds with the anatomical and physiological neuronal subtypes defined by connectivity and function.

Several lines of evidence support the sensitivity and reliability of our single-cell RNA-seq protocol. First, differential gene expression identified by single-cell RNA-seq is highly consistent with previous literature. First, we found highly correlated expression of two male-specific RNAs at the level of individual cells (Figure S1E), and mutually exclusive expression of two lineage-specific transcription factors (Figure S2F) as previously reported. Second, we validated five differentially expressed transcription factors derived from single-cell RNA-seq data (Figure 5A, B; Figure S4A, B). Third, sequencing of cells marked by known or newly identified PN class-specific markers matched well with specific transcriptome clusters (Figures 3, 4), enabling us to unequivocally match transcriptome clusters with glomerular classes. We expect that this approach can be generally applied to single-cell transcriptome analysis of many tissues and developmental stages in Drosophila and other organisms with small cell size, thus expanding the use of single-cell transcriptomics for addressing diverse biological questions.

We have developed a machine-learning algorithm called ICIM for unbiased identification of genes that distinguish subtypes. Because this algorithm recursively examines finer-grained subpopulations, it is capable of detecting genes that distinguish small subpopulations. ICIM is conceptually similar to previously described iterative analysis methods (Usoskin et al., 2015; Zeisel et al., 2015; Gokce et al., 2016; Tasic et al., 2016). However, ICIM may discriminate highly similar cell types with greater sensitivity than methods based on PCA because it reduces the feature space to only those genes that are informative for distinguishing cell types. ICIM allowed us to distinguish 30 clusters for 40 GH146+ PN classes. Our classification is limited by sampling depth because 17 classes contain only 1 cell per hemisphere (Yu et al., 2010). Sequencing of many more cells may resolve these classes into distinct clusters, resulting in a more complete description of PN transcriptome diversity.

Our analyses of transcriptome changes of identified PN classes across developmental stages demonstrate that transcriptomes of neuronal subtypes exhibit the largest difference during development, coincident with circuit assembly (Figure 6). This could be because key features of different PN classes are their input and output partners. Once PNs establish differential connectivity during development, they may use largely the same signaling machineries to convey different olfactory information in adults. This finding has important implications for using single-cell RNA-seq to classify neuronal types, since most studies have focused on adults (see Introduction). While these studies have been highly successful in classifying major neuronal types, functionally distinct subtypes may have been overlooked, resulting in an underestimate of neuronal type diversity.

Transcription factors (TFs) and cell-surface and secreted molecules (CSMs) are widely considered to be key determinants of cell fate and wiring specificity, respectively. Our single-cell transcriptome analyses provided objective data to support these notions. First, TFs and CSMs together account for more than 50% of the top 30 differentially expressed genes (Figure 7F, Figure 7G). Second, information theoretic analyses revealed that among the top 11 information-rich genes for distinguishing different PN subtypes, 8 are TFs. Third, TFs or CSMs alone contain nearly as much information in distinguishing different PN subtypes as all genes (Figure 7B). A key readout of TFs in determining neuronal subtype may be to control differential expression of CSMs, such that different subtypes differentially respond to a common extracellular environment to achieve their wiring specificity. However, we did not find a simple relationship between TFs and CSMs (Figure S7F). Supporting a role for TFs in regulating wiring specificity, we show that a newly identified lineage-specific homeobox-containing C15 can instruct lineage-specific dendrite targeting (Figure 5).

Finally, our analyses of PN transcriptomes shed light on the nature of the coding strategies that distinguish closely related neuronal subtypes. PN subtype identity is largely determined by a combinatorial code that utilizes a number of genes between the number of subtypes and the theoretical minimum for a maximally compact code, suggesting redundancy. The transcriptomes of closely related PN classes differed substantially during development (Figure 7F; Table S1), consistent with a recent report that closely related retinal cells have dozens of differentially expressed CSMs (Tan et al., 2015). A certain degree of redundancy can provide robustness to wiring precision (Hong and Luo, 2014), but creates challenges for dissecting genetic control of wiring specificity using single gene manipulation. The transcriptomes of identified PN classes can inform design of more precise experiments in which simultaneous manipulation of multiple genes through loss- and gain-of-function approaches allows experimental testing of the combinatorial TF and CSM codes.

STAR Methods

Contact For Reagent And Resource Sharing

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Liqun Luo (lluo@stanford.edu).

Experimental Model And Subject Details

Fly stocks

In all experiments, both male and female flies were used. The following fly lines were used in this study: Mz19-GAL4 (BL#34497) (Jefferis et al., 2004), C15-RNAi (line1, BL#27649), C15-RNAi (line2, BL#35018), Alrm-GAL4 (BL# 67032). Mz19-QF (Hong et al., 2012), GH146-GAL4 (Stocker et al., 1997), UAS-STOP-mCD8GFP (Hong et al., 2009), GH146-Flp (Potter et al., 2010), unpg-lacZ (Cui and Doe, 1995), trol-GAL4 (NP5103-GAL4, Kyoto Stock Center #113584), UAS-C15 (gift from Dr. Gerard Campbell) (Campbell, 2005), and 91G04-GAL4 (gift from Gerry Rubin) (Jenett et al., 2012).

CG31676-GAL4 was generated using CRISPR/Cas9 based insertion of SA-T2A-GAL4 into the first intron of CG31676 gene following the method described by Diao et al. (2015). In brief, a 2.5kb DNA fragment, containing a PAM site in the middle within the first intron of the CG31676 gene, was PCR amplified from wild-type genomic DNA, and inserted into Blunt TOPO vector (Invitrogen). Then, SA-T2A-GAL4 was PCR amplified from the pT-GEM(1) plasmid (Addgene #62893) and was inserted (NEBuilder HiFi DNA assemble kit) into the TOPO-CG31676-intron construct three-nucleotide before the PAM site of the intron. This construct and a gRNA plasmid (pU6-BbsI-ChiRNA, Addgene #45946) containing a 20-nt target sequence upstream of the PAM inserted into the BbsI site were co-injected to nos-Cas9 (gift from Dr. Ben White) (Diao et al., 2015) embryos to obtain transgenic flies.

Method Details

MARCM analysis

hsFlp based MARCM analyses were performed as previously described (Lee and Luo, 1999; Jefferis et al., 2001). Briefly, transgenic flies linked with a FRT chromosome were crossed with MARCM-ready flies (containing hsFlp, UAS-CD8GFP, Mz19-GAL4, TubP-GAL80 and desired FRT). Mz19-GAL4 was used to label VA1d, DC3 and DA1 PNs. Larvae (24h to 48h after hatching) from the cross were heat shocked for 1h in a 37°C water bath. Both single-cell and neuroblast clones could be observed in this fashion.

Immunostaining

Tissue dissection and immunostaining were performed following previously described methods (Wu and Luo, 2006). Briefly, fly pupal and adult brains were dissected in 1x PBS and then fixed in 4% paraformaldehyde (20% paraformaldehyde diluted in PBS with 0.015% Triton X-100) for 20 min at room temperature. Fixed brains were washed three times with PBST (PBS with 0.3% Triton X-100) and incubated in PBST twice for 20 min. The samples were incubated in blocking buffer (5% normal goat serum in PBST) for 30 min at room temperature or overnight at 4°C. Then, primary antibodies diluted in blocking buffer were applied and samples were incubated for 24–48 h at 4°C. Then, samples were washed using PBST for 20 min twice, and secondary antibodies diluted in blocking buffer were applied and samples were incubated in dark for more than 24 h at 4°C. Samples were washed in PBST for 20 min twice and mounting solution (Slow Fade Gold) was added. Samples were left in mounting solution for at least 1 h before mounting them onto glass slides. All wash steps were performed at room temperature. Primary antibodies used in this study include rat anti-DNcad (DN-Ex #8; 1:40; DSHB), mouse anti-Prospero (1:200; DSHB), mouse anti-Cut (2B10; 1:50; DSHB), mouse anti-β-gal (1:500; Promega), chicken anti-GFP (1:1000; Aves Labs), rabbit anti-DsRed (1:250; Clontech), mouse anti-ratCD2 (OX-34; 1:200; AbD Serotec), rat anti-C15 (1:200; gift from Dr. Gerard Campbell) (Campbell, 2005), and guinea pig anti-knot (1:200; gift from Dr. Adrian Moore) (Jinushi-Nakao et al., 2007). Secondary antibodies were raised in goat or donkey against rabbit, mouse, rat, and chicken antisera (Jackson Immunoresearch), conjugated to Alexa 405, 488, FITC, Cy3, Cy5, or Alexa 647.

Quantitative PCR (qPCR)

Total RNA from 3–5 day old adult fly heads was extracted using MiniPrep kit (Zymo Research, R1054). Complementary DNA was synthesized using an oligo-dT primer. qPCR was performed on a Bio-Rad CFX96 detection system. Relative expression was normalized to Actin5C. Primer sequences used for qPCR were:

Actin5C (F): 5’-CTCGCCACTTGCGTTTACAGT-3’
Actin5C (R): 5’-TCCATATCGTCCCAGTTGGTC-3’
C15 (F): 5’- AGCGCTTCCACAAGCAAAAG-3’
C15 (R): 5’- CCGTCTGTCGTCTCCACTTG-3’

Imaging and quantification procedure

Confocal images were collected with a Zeiss LSM 780 and processed with ImageJ and Adobe Illustrator. For quantification of the angles in Figure 5C, the vertical line was drawn based on the position of two antennal lobes and the intersecting line was drawn through the centers of gravity of the VA1d and DA1 glomeruli, then the intervening angle was measured using ImageJ.

Single-cell RNA-sequencing

Drosophila brains with mCD8GFP-labeled cells using specific GAL4 drivers were manually dissected, and optic lobes were removed. Single-cell suspensions were prepared following Tan et al. (2015) with several modifications (see detailed procedure below). Single labeled cells were sorted via Fluorescence Activated Cell Sorting (FACS) into individual wells of 96-well plates containing lysis buffer using an SH800 instrument (Sony Biotechnology). Full-length poly(A)-tailed RNA was reverse-transcribed and amplified by PCR following the SMART-seq2 protocol (Picelli et al., 2014) with several modifications. To increase cDNA yield and detection efficiency, we increased the number of PCR cycles to 25. To reduce the amount of primer dimer PCR artifacts, we digested the reverse-transcribed first-strand cDNA using lambda exonuclease (New England Biolabs) (37°C for 30 min) prior to PCR amplification. Sequencing libraries were prepared from amplified cDNA using tagmentation (Nextera XT). Sequencing was performed using the Illumina Nextseq 500 platform with paired-end 75 bp reads.

Fly brain dissociation

Before the fly brain dissociation, make sure following essential reagents and supplies are readily available: Schneider’s medium (Thermo Fisher, 21720024), papain (Worthington PAP2, LK003178), liberase TM (Roche, 5401119001; reconstitute liberase TM with 1x sterile PBS on ice to get final concentration 2.5mg/ml, make aliquots of 20ul into PCR tubes, and store them at −20°C), falcon tubes with cell strainer (35um), microfuge tube shaker, and syringes with 25G 5/8 needles.

Make fresh papain solution for every dissociation experiment. Get 1 vial of papain (Worthington PAP2, LK003178), dissolve the powder using 1x sterile PBS (final concentration 100 units/ml), and re-suspend papain by gently shaking the vial (if use pipet, avoid bubbles which will decrease the enzyme activity). Then make aliquots of 300μl papain solution for each EP tubes, and activate it in 37°C water bath for 10–30 min. Add 4.1μl of liberase TM solution (2.5mg/ml) into 300μl papain solution to obtain a final concentration of about 0.18 units/ml. Cool down the solution to room temperature before adding it to brain samples. Since it takes 10–30 min to activate the papain solution, coordinate this step with fly brain dissection.

To estimate how many brains are required during sample preparation, please refer to recovery rates in Figure S1C for calculation. In our current study, we dissected about 120 pupal brains from GH146-GAL4,UAS-mCD8GFP flies to collect about 10 plates of cells (10*96). For the rare population labeled by 91G04-GAL4 or trol-GAL4 (2–3 cells each hemisphere), we dissected about 200 pupal brains to get half plate of cells (~50).

Following next steps to prepare single cell suspension:

-
Dissect pupal/adult fly brains in ice-cold Schneider’s medium. Remove the optical lobes if all desired cells are in the central brain (e.g., GH146+ PNs). Transfer every brain using P20 pipet into the EP tube containing 500ul Schneider’s medium and keep the tube on ice. The brains can be in the cold Schneider’s medium for up to 2 hours before the next step.
-
After dissecting enough number of brains, spin them down using bench-top microfuge for 10s and remove Schneider’s medium. Wash brains for 3 times at room temperature with 1x sterile PBS to completely remove Schneider’s medium.
-
Add 300μl papain solution (37°C activated and 4.1ul liberase added) to each sample and incubate it in a microfuge tube shaker (25°C, 1,000rpm) for 20 min in total. At 5 and 10 min time points, pipet the solution up and down (avoid any bubble) for 30 times, and then continue shaking. At 15 min time point, pass solution through 25G 5/8 needles for 7X (avoid any bubble). Shake the tube for another 5 min. To increase yield, use spare papain solution to coat the tips/needles before passing the brain sample.
-
Inactivate the enzyme by adding 400μl cold Schneider’s medium (total 700ul). Filter solution through cell strainer (35μm) into a 5ml falcon tube (keep tapping the tube until the solution go through the filter). Wash the EP tube with 800μl cold Schneider’s medium and filter through cell strainer (total 1,500μl).
-
Transfer the 1,500μl solution to an EP tube and centrifuge for 7min at 4°C, 600xg. Discard supernatant, re-suspend cells with 1,000μl (or desired volume depending on cell density) Schneider’s medium, and transfer it to 5ml FACS tube. Add desired florescent dye (e.g., Ethidium homodimer-1, Initrogen L3224, as a dead cell marker) and keep the tube on ice until FACS sorting.

Quantification And Statistical Analysis

For RNA-seq data analysis, we first provide an overview of our methods, then describe how these methods were applied to create each figure. All analysis was performed in Python using Numpy, Scipy, Pandas, scikit-learn, and a custom single-cell RNA-seq module. Sequencing reads and preprocessed sequence data are freely available from the Gene Expression Omnibus (accession number GSE100058). Code is freely available from Github (https://github.com/felixhorns/FlyPN).

Sequence alignment and preprocessing

Reads were aligned to the Drosophila melanogaster genome (r6.10) using STAR (2.4.2a) (Dobin et al., 2013) with the ENCODE standard options, except “--outFilterScoreMinOverLread 0.4 --outFilterMatchNminOverLread 0.4 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04”. Uniquely mapped reads that overlap with genes were counted using HTSeq-count (0.7.1) (Anders et al., 2015) with default settings except “-m intersection-strict”. Cells having fewer than 300,000 uniquely mapped reads were removed. To normalize for differences in sequencing depth across individual cells, we rescaled gene counts to counts per million (CPM). All analyses were performed after converting gene counts to logarithmic space via the transformation Log₂(CPM+1). Cells that were labeled with neuron-specific GAL4 drivers (C155+, GH146+, Mz19+, 91G04+, Trol+, and CG31676+ cells) were filtered for expression of canonical neuronal genes (elav, brp, Syt1, nSyb, CadN, and mCD8GFP), retaining only those cells that expressed at least 4/6 genes at >15 CPM. After filtering, 97.3% of GH146+ PN cells express mCD8GFP (at >15 CPM).

Dimensionality reduction and clustering

Single-cell RNA-seq yields high dimensional gene expression data. To visualize and interpret these data, we obtained two-dimensional projections of the cell population by first reducing the dimensionality of the gene expression matrix using principal component analysis (PCA), then further reducing the dimensionality of these components using t-distributed Stochastic Neighbor Embedding (tSNE) (van der Maaten and Hinton, 2008). We note that tSNE is a nonlinear embedding that does not preserve distances, so one cannot interpret the distances in the projected space directly as distances between gene expression profiles (i.e., the pre-transformation space).

We performed PCA on a reduced gene expression matrix composed of the top 500 overdispersed genes (as described below). To identify significant principal components (PCs), we examined the distribution of eigenvalues obtained by performing PCA after shuffling the gene expression matrix (with 100 replicates). A PC was considered significant if the magnitude of its associated eigenvalue exceeded the maximum magnitude of eigenvalues observed in the shuffled data. Significant components (typically 7–12 PCs) were used for further analysis. We further reduced these components using tSNE to project them into a two-dimensional space.

ICIM (see detailed description of ICIM below) is an unsupervised machine learning algorithm that identifies a set of genes which distinguishes transcriptome clusters, which may correspond to cell types (described below). In our analysis of GH146+ PNs, this set typically includes ~500 genes. To visualize and interpret the single-cell gene expression data, we further reduced its dimensionality using tSNE to project the reduced gene expression matrix (consisting of only the genes identified by ICIM) into a two-dimensional space.

Overdispersion analysis

Genes that are highly variable within a population often carry important information for distinguishing cell types. We were interested in identifying such genes and using them for dimensionality reduction and clustering analyses. Variability of gene expression depends strongly on the mean expression level of a gene. This motivates the use of a metric called dispersion, which measures the variability of a gene’s expression level in comparison with other genes that are expressed at a similar level. Overdispersed genes are those that display higher variability than expected based on their mean expression level.

To identify overdispersed genes, we binned genes into 20 bins based on their mean expression across all cells. We then calculated a log-transformed Fano factor D(x) of each gene x

D (x) = {log}_{10} [σ^{2} (x) / μ (x)]

where σ²(x) is the variance and μ(x) is the mean of the expression level of the gene across cells. Finally, we calculated the dispersion d(x) as the Z-score of the Fano factor within its bin

d (x) = D (x) - Mean [D (x)] / Std [D (x)]

where Mean[D(x)] is the mean log-transformed Fano factor within the bin and Std[D(x)] is the standard deviation of the log-transformed Fano factor within the bin. We then rank genes by their dispersion and select the top genes for downstream analysis.

Iterative Clustering for Identifying Markers (ICIM)

To identify subpopulations of cells corresponding to PN subtypes, we developed an unsupervised machine-learning algorithm, Iterative Clustering for Identifying Markers, which we call ICIM. We observed that standard dimensionality reduction and clustering methods using PCA and tSNE failed to discriminate subpopulations that corresponded to known PN lineages and molecular features. We attributed the failure of these methods to the high degree of similarity of transcriptional states among PN subtypes, which represent closely-related neurons having similar functions. All PN subtypes are born from one of the two common progenitor cells (neuroblasts) and have similar functional roles in the adult fly. Thus, PN subtypes may be distinguished by a small number of genes.

In the language of machine learning, the performance of dimensionality reduction and clustering methods depends critically on feature selection. Selection of informative genes that vary among cell types can improve discrimination in dimensionality reduction and clustering analysis.

We developed ICIM as a strategy to identify the most informative genes for distinguishing subpopulations within a population of closely-related cells in an unbiased way. Starting with a population of cells, we first identify the top 100 overdispersed genes within this population. Next we expand this set of genes by finding genes whose expression profiles are strongly correlated with the overdispersed genes (Pearson correlation > 0.5). We also filter this set of genes by (1) removing those having fewer than 2 correlated partners, and (2) those that are expressed in >80% of cells. Filter (1) removes noisy genes based on the idea that genes that carry information about cell type are expressed within gene modules and therefore have expression profiles that are correlated with at least one other gene. Filter (2) removes housekeeping genes that are detected in nearly all cells, and have variation in expression levels due to biological and technical noise, but this variation is not informative for purposes of distinguishing cell types. Cells are then clustered based on their expression profiles of these genes (average-linkage clustering using correlation metric). We cut the dendrogram at the deepest branch and partition the population into two subpopulations. The same steps are then performed iteratively on each subpopulation. Iteration continues until a population cannot be split into subpopulations because it is “homogeneous”. The termination condition is defined as the minimum terminal branch length (the most similar nearest-neighbor correlation distance between the expression profiles of cells) being larger than 0.2. This condition arises when the algorithm attempts to discover genes within a homogeneous population and finds a very large number of genes (typically >1000 genes) that vary in an incoherent manner between cells. When the algorithm terminates, we collect all genes that were identified at any stage. The result of this analysis is a set of genes that discriminate subpopulations within a population, which can be used for dimensionality reduction (as described above). We note that this algorithm identifies informative genes in an unbiased manner without knowledge of the ground truth of the number of cell types and their differences. The results of the algorithm were robust across a wide range of parameters.

Why does ICIM outperform previously used approaches, such as PCA? PCA reduces the feature space in a manner that assigns weights to genes based on their information content. This has two consequences: (1) downstream analysis uses the weighted gene expression information, which imposes assumptions about the statistical relationships between genes, and (2) while less informative genes are assigned smaller weights, they nevertheless can contribute to downstream analysis. In contrast, ICIM explicitly removes genes that are deemed uninformative from further consideration and assigns equal weights to those that are kept. These attributes make ICIM a more effective feature selection strategy for analysis of highly similar cellular subtypes.

Differential expression analyses

To find differentially expressed genes, we used the Mann-Whitney U test, a non-parametric test that detects differences in the level of gene expression between two populations. The Mann-Whitney U test is advantageous for this application because it makes very general assumptions: (1) observations from both groups are independent and (2) the gene expression levels are ordinal (i.e., can be ranked). Thus the test applies to distributions of gene expression levels across cells, which rarely follow a normal distribution. Using the Mann-Whitney U test, we compared the distributions of expression levels of every gene separately. P values were adjusted using the Bonferroni correction for multiple testing. Different significance thresholds for determining whether a gene is differentially expressed were used for various analyses in this work.

TF and CSM lists

To identify genes that are transcription factors (TFs) or cell surface molecules (CSMs), we used manually curated lists. We obtained a list of Drosophila TFs from the FlyTF v1 database, (http://www.mrc-lmb.cam.ac.uk/genomes/FlyTF) and CSMs from (Kurusu et al., 2008). These lists were manually curated to remove spurious annotations and redundancies according to Flybase annotation, resulting in 1045 TFs and 955 CSMs.

Analysis methods for figures

Single-cell transcriptome analyses of neurons and glia in Figure 1.

We formed a population consisting of 946 GH146-GAL4+ cells and 67 alrm-GAL4+ cells. We performed dimensionality reduction and clustering analysis using PCA and tSNE as described above. We identified the top 500 overdispersed genes in the population. We used PCA to reduce dimensionality, retaining 7 significant PCs. Then we projected the population into a two-dimensional space using tSNE with perplexity 30 and learning rate 500 (Figure 1E). We also performed hierarchical clustering using complete linkage and a Euclidean metric based on manually selected neuronal and glial marker genes (Figure 1D).

Removal of GH146+ vPNs and APL neurons in Figure 2.

We initially formed a population consisting of 946 GH146+ cells. Using ICIM, we identified 158 genes that distinguish subtypes. We then projected the population into a two-dimensional space using tSNE. We observed several distinct subpopulations corresponding to GH146+ neuronal types that do not belong to the adPN or lPN lineages. Specifically, two clusters were composed of ventral PNs (vPNs), which robustly express several specific markers (Gad1, Lim1, and toy). Three other clusters were composed of APL neurons, which robustly express other specific markers (Wnt4, VGlut, and fd102C), and are arranged adjacent to one another by tSNE, reflecting the similarity of expression profiles among these cells. For subsequent analyses of GH146+ adPNs and lPNs, we removed these cells by excluding cells expressing 2/3 of these marker genes at >15 CPM.

Single-cell transcriptome analyses of GH146+ cells in Figure 2.

We initially attempted to identify distinct subpopulations representing PN subtypes using PCA and tSNE for the 946 GH146+ PNs (including vPNs and APL neurons). We began by identifying the top 500 overdispersed genes and performing PCA to reduce the gene expression data to 10 significant PCs. Then we projected the population into a two-dimensional space using tSNE with perplexity 30 and learning rate 500. We observed that this analysis fails to separate distinct subpopulations (Figure 2B).

We next attempted to distinguish subpopulations corresponding to PN subtypes using ICIM and tSNE. Using ICIM (Figure 2C), we identified 561 genes for the 902 GH146+ PNs (representing adPNs and lPNs, after removing vPNs and APL neurons as described above). We projected these cells into a two-dimensional space using tSNE using as a distance matrix the pairwise Pearson correlation of the expression profiles of these genes, and perplexity 10, learning rate 250, and early exaggeration 4.0 (Figure 2D). Because tSNE computes a nonlinear embedding that does not preserve distances in the original space, the distances between cells cannot be directly interpreted in terms of similarity of expression profiles. As a consequence, there are cases where cells belonging to the same cluster are separated by larger distances than cells belonging to different clusters. We classified cells into clusters in an unbiased manner using HDBSCAN with min_cluster_size=5 and min_samples=3 on coordinates after tSNE projection.

Mapping clusters to PN classes in Figure 3.

We formed a population consisting of 902 GH146+ cells, 123 Mz19+ cells (at 24h APF), and 23 91G04+ cells. Using the 561 genes identified using ICIM on GH146+ cells, we projected this population into a two-dimensional space using tSNE with perplexity 15 and learning rate 1000. For visualization, we colored the cells according to their genotype, revealing that the Mz19+ and 91G04+ cells belong exclusively to 5 clusters (Figure 3C).

Mapping clusters to PN classes in Figure 4.

We formed a population consisting of 902 GH146+ cells and 41 trol-GAL4+ cells. trol-GAL4+ cells that were not expressing trol (CPM < 7) were removed, leaving 28 cells for further analysis. Using the 561 genes identified using ICIM on GH146+ cells, we projected this population into a two-dimensional space using tSNE with perplexity 15 and learning rate 1000. For visualization, we colored the cells according to their genotype, revealing that the vast majority of trol+ cells belong exclusively to 1 cluster (Figure 4C).

Analysis of transcriptome changes during development in Figure 6.

To understand how transcriptional state changes during development and maturation of PN subtypes, we collected Mz19+ cells from flies at 5 stages of development: 24h, 36h, 48h, and 72h after puparium formation (APF), and 1–2 day adults. We formed a population consisting of 485 cells (123, 83, 92, 92, and 95 cells at each stage, respectively), after filtering to remove low quality cells and those not expressing neuronal markers (as described above). Using ICIM, we identified 497 genes that distinguish cell subtypes and developmental stages. We projected this population into a two-dimensional space based on these genes using tSNE with perplexity 10 and learning rate 500. Cells formed several distinct subpopulations corresponding to different PN subtypes and developmental stages (Figures 6B and Figures 6C). We assigned subpopulations to subtypes (DA1, VA1d, and DC3) based on the expression of key lineage factors (Figure 6B).

To quantify transcriptome changes in the closely related PN subtypes VA1d and DC3 during development, we devised a metric called the type identity score, which is the scaled sum of expression levels of genes that distinguish VA1d and DC3 cells. We identified these genes using differential expression analysis comparing VA1d and DC3 populations at all times that the two populations are distinct as determined by ICIM and tSNE (24h, 36h, 48h, and 72h APF). 78 cells were included in the VA1d group and 64 cells were included in the DC3 group. This analysis yielded 22 genes of which 13 are highly expressed in VA1d and 9 are highly expressed in DC3 cells at a significance level of P < 10⁻⁵ after the Bonferroni adjustment for multiple testing. We rescaled expression levels of these genes to the range 0 to 1 (by dividing each expression level by the maximum among the population), then calculated the type identity score I of each cell as the mean normalized expression level,

I = 1 | X V A 1 d | x \in X V A 1 d x - 1 | X D C 3 | x \in X D C 3 x

where XVA1d is the set of genes that are highly expressed in VA1d and XDC3 is the set of genes that are highly expressed in DC3, and |X| is the cardinality of set X. We then plotted the type identity scores of each cell at each developmental stage (Figure 6D).

As an alternative method to analyze transcriptome differences, we also examined correlations in transcriptome states in an unbiased genome-wide manner. This method has the advantage that it does not require the choice of a P value cutoff for determining significance. We formed a population consisting of Mz19+ adPNs (belonging to both subtypes VA1d and DC3) at each stage of development. Then we calculated the Pearson correlation of the expression profiles of the 497 genes identified by ICIM for every pair of cells (Figure 6E). These plots revealed a bimodal distribution containing two distinct peaks, corresponding to pairs of cells that both belong to the same subtype (more similar peak) and pairs of cells which belong to different subtypes (less similar peak). As development proceeds, the transcriptome similarity of these two subtypes diminishes until vanishing, as reflected in the merging of these two distinct peaks and the emergence of a unimodal distribution in adulthood.

To compare transcriptome differences between neuroblast lineages, we performed differential expression analysis comparing VA1d and DC3, VA1d and DA1, and DL3 and DA1 PNs across developmental stages. Because the P values of a differential expression test depend strongly on the number of cells involved in the test, we sampled cells so that all populations had the same number of cells. Specifically, for each comparison, we sampled 12 cells from each population without replacement and performed differential expression analysis, then repeated this procedure (100 replicates) and calculated the median P value across the replicates for each gene. We then counted the number of differentially expressed genes at P < 0.001 based on the median P value (Figure 6G). All cells used for this analysis were Mz19+ PNs, except DL3 cells were GH146+ PNs.

To characterize transcriptome changes distinguishing PNs in the wiring stages of development from PNs in adulthood, we performed differential expression analysis comparing the population of Mz19+ PNs at 24h APF to the population of Mz19+ PNs in adults. We found 1097 differentially expressed genes at significance level of P < 10⁻⁵ after the Bonferroni adjustment for multiple testing. This included 592 genes that were highly expressed in 24h APF cells, and 478 genes that were highly expressed in adult cells. We performed Gene Ontology (GO) analysis on these genes using Flymine and removed the redundant GO terms using REVIGO (Supek et al., 2011). We report the number of genes corresponding to and the P value of enrichment of each term (Figure S5A).

To identify transcriptional waves during Mz19+ PN development, we considered the 1097 genes that were differentially expressed between 24h APF and adult cells. We calculated the median expression of each gene at each time point. We normalized these median expression values by dividing by the maximum value across time points, such that each expression value became a relative expression level between 0 and 1. We then performed dimensional reduction on the expression profiles of the genes using TSNE with perplexity 20, learning rate 1000, and early exaggeration 6.0. We identified clusters among the genes using HDBSCAN with min_cluster_size 25 and min_samples 5 on the projected coordinates. This resulted in the identification of 13 distinct waves, of which 6 involved up-regulation and 7 involved down-regulation from 24h APF to adult cells (Figure S5B). We plotted the mean relative expression level of each gene at each time point (black dots connected by black lines). The relative expression profile of each individual gene belonging to each wave was also plotted (gray lines). We calculated the fraction of genes within each wave that are transcription factors (TFs) or cell surface molecules (CSMs) using the lists of TFs and CSMs that were obtained as described above.

Characterizing genes that distinguish PN subtypes in Figure 7.

To identify genes that distinguish closely related PN subtypes, we performed differential expression analysis comparing the Mz19+ VA1d and DA1 populations and the Mz19+ VA1d and DC3 populations at each developmental stage. These genes are presented in Table S1. Because the significance level of expression level differences depends on the number of cells involved in the comparison and the number of cells varies across developmental stages, we analyzed the top 30 differentially expressed genes regardless of their significance level. We note that the significance values of these genes were nearly all P < 10⁻⁴. We calculated the fraction of genes within each wave that are transcription factors (TFs) or cell surface molecules (CSMs) using the lists of TFs and CSMs that were obtained as described above (Figure 7F).

We next performed a similar differential expression analysis comparing all pairs of subtypes. For each subtype, we formed a population consisting of GH146+ cells belonging to that subtype at 24h APF. For each pair of subtypes we calculated differential expression for each gene and ranked the genes by their significance level (P value). We then calculated the fraction of TFs or CSMs among the top N genes for varying N from 30 to 1000. We calculated the enrichment of TFs or CSMs compared to their genomic representation by dividing the fraction of TFs or CSMs by the genomic fraction of TFs (6.7%) or CSMs (6.2%). We plotted the distribution of these enrichment values for various values of N (Figures 7G).

Searching for unique marker genes for PN subtypes in Figure 7.

We sought to identify unique markers for each GH146+ PN subtype. We formed populations each consisting of GH146+ cells belonging to a cluster identified using ICIM and tSNE (Figure 2D). We then performed differential expression analysis comparing each cluster to all other GH146+ cells. We selected genes that were differentially expressed at a significance level of P < 0.05 after the Bonferroni adjustment for multiple testing and having median expression within the cluster of interest of >7 CPM, resulting in 1103 genes. We then filtered for genes that were identified as significantly enriched in only one cluster, resulting in 257 genes. This step was necessary because some genes were identified as significantly enriched in multiple clusters, which is consistent with reuse of genes as identity factors within a combinatorial code. Finally, we identified genes that were robust and unique markers for a single cluster. To do this, we calculated the fraction of cells within each cluster expressing a given gene at >7 CPM. We then filtered for genes that were expressed in >50% of the cells within a given cluster and in <10% of the cells in any other cluster. We plotted the distribution of expression levels of these genes in each cluster (Figure S6A). We also attempted to search using less criteria. For example, Figure S6B shows the result when we required that the gene is expressed in >50% of the cells (>7 CPM) within a given cluster and in <25% of the cells in any other cluster.

Technical artifacts such as dropouts can hinder the identification of unique markers. We therefore estimated the probability that our failure to identify unique markers can be accounted for by dropout. To do so, we assumed that each of the 30 molecularly distinct GH146+ PN subtypes expresses a single unique marker gene at a low level of expression (7 CPM) in a ubiquitous fashion (i.e. in all the individual cells belonging to that subtype). In our data, genes expressed at an average level of 7 CPM are not detected in ~60% of cells. We can fail to detect a gene because (1) the cell is not expressing the gene, or (2) because of noise in gene expression (biological noise) or technical dropout (measurement noise). We therefore can estimate an upper bound at 60% on the probability of dropout of a gene that is expressed at 7 CPM on average. Our approach for identifying unique markers requires that the gene is detected in 80% of cells within a cluster. The probability of failure to detect the marker gene for a given cluster due to dropout is therefore given by the probability of dropouts in 20% of the cells in a cluster. This probability is:

P_{failure} = P_{dropout}^N_{dropouts}

where P_dropout is estimated to be 60% and N_dropouts = 0.2 * N_cells is the number of cells in which the gene must drop out. We calculated this probability based on the number of cells N_cells in every cluster, ranging from 5 to 108. Then we calculated the probability that 25 out of 30 clusters do not have a unique marker gene by multiplying the probabilities of failure in 25 randomly sampled clusters. We performed this sampling 10,000 times and report the average (P = 10⁻⁶²). This value represents the probability of failing to detect a unique marker gene for 25 out of 30 clusters given that each cluster is expressing a single unique marker at 7 CPM.

These calculations are conservative in several ways. First, we assumed that each cluster expresses a single marker gene. Realistically, each subtype may express multiple unique markers. This would increase the probability of detecting at least one of them. Second, unique markers may be expressed at levels higher than 7 CPM. We observed that the unique marker genes that we discovered are expressed at levels well above 7 CPM (Figure S6A) and biologically it is unlikely that a type identity factor would be expressed at extremely low levels. Thus, we estimate that the likelihood that our failure to detect marker genes can be explained by dropouts alone is very small.

Information theory-based analyses in Figure 7.

We sought to identify minimal sets of genes that can encode the subtype identity of GH146+ PNs in a combinatorial fashion. Our motivation was to determine by direct search whether such molecular combinatorial codes exist.

To address this, we devised an algorithm that finds a minimal set of genes that is sufficient to encode the subtype identity of cells in a combinatorial manner drawing upon ideas from information theory. An introduction to information theory is outside the scope of this work. Nevertheless, we provide a brief description of the basic concepts. Then we describe the algorithm and how it was applied in this work.

Entropy measures the uncertainty of a random variable (Shannon, 1948; Cover and Thomas 2006). Conditional entropy measures the uncertainty of a random variable given the knowledge of another variable (i.e., after conditioning on another variable). Conditioning on data never increases uncertainty (on average), which agrees with our intuition that additional information never hurts.

We use the notion of entropy H(C) to describe the uncertainty of cell type classification C. We use conditional entropy to describe the reduced uncertainty in classification due to knowledge of the expression state of a gene, H(C|G). The information gain due to knowledge of the expression state of the gene is the mutual information between the gene and the classification I(G;C), which can be defined as

I (G; C) = H (C) - H (C | G),

where H(C) is the entropy of cell type classification (without knowledge of the expression states of any genes) and H(C|G) is the entropy of cell type classification after conditioning on the expression state of gene G. Mutual information I(G;C) describes how much our uncertainty about classification C decreases when we observe the gene G.

Mutual information I(G;C) can also be calculated directly from the probability distributions of cell type classes and expression states. For two discrete random variables G and C with their joint probability density function (pdf) p(x,y), the mutual information of G and C is defined as

I G; C = g c p g, c \log p g, c p c p g = H (C) - H (C | G),

We often calculate the information content of a gene G with respect to the cell type classification C using this equation. Throughout this work, the base of the logarithm is 2 and so the unit of entropy and information is bit.

We now describe the algorithm for finding a minimal combinatorial code. This problem is closely related to the Feature Reduction n-k (FRn-k) problem in machine learning (Battiti 1994). To solve this problem, we employ a greedy algorithm using mutual information similar to that described in (Kwak and Choi, 2002). The problem is formulated as follows:

Given an initial set F with n features and set C of all output classes, find the subset S ⊆ F with k features that minimizes H(C|S), which is equivalent to maximizing the mutual information I(C;S).

Our algorithm is as follows:

(1)
Initialize S as the empty set and F as the initial set of n features.
(2)
For all f_i in F, compute I(C;f_i).
(3)
Find the feature f_i that maximizes I(C; f_i). Add f_i to set S. Remove f_i from set F.
(4)
Repeat until the desired number of features k is selected:
- (1)
  For all f_i in F, compute I(C;f_i|S).
- (2)
  Find the feature f_i that maximizes I(C; f_i|S). Add f_i to set S. Remove f_i from set F.
(5)
Return the set S containing the selected features.

We repeat this computation with increasing k until the output set S explains a chosen amount of uncertainty in the classification C. Typically, we choose this termination condition as 99% of the entropy H(C) of classification C.

We note that the computation of mutual information is dramatically more efficient when G and C are discrete. We therefore binarized expression levels using a cutoff of log₂(CPM+1) = 3. This cutoff was chosen based on the minimum in the distribution of expression levels across all genes and all cells (Figure S7A). We varied the cutoff value between 2 and 6 and found that our results were essentially unchanged. The compactness of the minimal codes for GH146+ PN subtype identity and the genes included in the code were nearly identical to that obtained using the cutoff of 3 (data not shown). We also calculated the correlation between the information carried by each gene under different values of the cutoff with the information carried under the cutoff of 3 (Figure S7B). This analysis revealed that the information content of genes is not very sensitive to the precise choice of cutoff for binarization across the range of 2 to 6. We also found that other discretization schemes, such as a different number of levels of expression (e.g. Off, Low, Medium, High), yielded similar results (data not shown).

Combinatorial coding of PN subtype identity in Figure 7.

We initially applied the information theory-based algorithm to Mz19+ PN cells to test whether it is capable of identifying a set of genes that is sufficient for a combinatorial code of cell type identity. We formed a population consisting of the 175 GH146+ cells that belong to the classes labeled by Mz19-GAL4 (108 DA1, 35 VA1d, and 32 DC3 cells). We created a binary expression matrix consisting of the ON/OFF states of all 15,522 genes that remained after removing genes that were not detected in any cells. We calculated the mutual information of each gene with respect to the GH146+ subtype classification. We then used the greedy algorithm described above to find a minimal set of genes for encoding GH146+ subtype identity (Figure 7A). The initial set of features F was the top 30 most informative genes among the 15,522 genes in the expression matrix.

We next applied this approach to all GH146+ PNs. We formed a population consisting of the 902 GH146+ cells belonging to the adPN and lPN lineages (Figure 2D). We created a binary expression matrix consisting of the ON/OFF states of all 15,522 genes that remain after removing genes that are not detected in any cells. We calculated the mutual information of each gene with respect to the GH146+ subtype classification (Figure 2D). We used the greedy algorithm described above to find minimal sets of genes for encoding GH146+ subtype identity with k varying from 1 to 20. The initial set of features F was the top 30 most informative genes among all 15,522 genes (genome-wide), among the 1045 TFs, or among the 955 CSMs. We plotted the uncertainty explained by the minimal codes obtained with each value of k (Figure 7B). We then chose minimal codes that explained 95% of the uncertainty of GH146+ subtype classification (Figures 7C–E). For plotting, we binarized the mean expression level of each gene in each cluster using the cutoff of log₂(CPM+1) = 3 (Figures 7C–E).

To evaluate the performance of classifiers using these minimal codes, we performed leave-one-out cross-validation. We formed a training set consisting of 901 GH146+ cells, after leaving out a single cell. We then searched for a minimal combinatorial code for subtype identity using these cells and chose a minimal code that explained 95% of the uncertainty in subtype classification. We then performed multinomial classification of the test cell based on its expression states of the genes in the code. Specifically, the predicted class of the test cell was the class having the minimum Hamming distance to the expression state of the cell. In the event of a tie, we assumed that the classifier would make a random, uniformly weighted choice between the tied classes. Performance was plotted as a confusion matrix (Figure S7D), which depicts the fraction of cases having each true label that are classified as each predicted label.

To evaluate whether TFs and CSMs carry more information than other genes, we found minimal sets of genes using an initial set of features F consisting of 1,000 genes chosen at random from among the 13,631 expressed in the genome after excluding the genes annotated as TFs and CSMs. We performed this search with 100 replicates. We plotted the mean uncertainty explained at various values of k and the standard deviation across the replicates (Figure 7B). We evaluated the distribution of expression levels of TFs, CSMs, and other genes by calculating the median expression of each gene in all GH146+ PN cells, and plotting the distribution across all genes in each category (Figure S7E).

To identify regulatory relationships between TFs and CSMs, we performed clustering of the expression state profiles of TFs and CSMs across the clusters of GH146+ cells. The top 30 TFs and CSMs ranked by mutual information with cell type identity were selected. The binary expression state of each gene in each cluster was calculated using the cutoff of log₂(CPM+1) = 3 based on the mean expression level in the cluster. We performed average linkage clustering using the Hamming distance metric on these expression states (Figure S7F).

Data And Software Availability

Sequencing reads and preprocessed sequence data are freely available from the Gene Expression Omnibus (accession number GSE100058). Code is freely available from Github (https://github.com/felixhorns/FlyPN).

Additional Resources

Key Resources Table

REAGENT or RESOURCE	SOURCE	IDENTIFIER
Antibodies
Ratanti-DNcad	Developmental Studies Hybridoma Bank	DN-Ex#8
Chicken anti-GFP	Aves Labs	GFP-1020
Rabbit anti-DsRed	Clontech	632496
Ratanti-C15	(Campbell, 2005]	N/A
Guinea pig anti-knot	Jinushi-Nakao et al., 2007	N/A
Bacterial and Virus Strains





Biological Samples





Chemicals, Peptides, and Recombinant Proteins





Critical Commercial Assays





Deposited Data
Sequencing reads	This paper	GSE100058
Preprocessed sequence data	This paper	GSE100058



Experimental Models: Cell Lines





Experimental Models: Organisms/Strains
D. melanogaster: Mzl9-GAL4	Bloomington Drosophila Stock Center	BDSC: 34497
D. melanogaster: Alrm-GAL4	Bloomington Drosophila Stock Center	BDSC: 67032
D. melanogaster:C15-RNAI #1	Bloomington Drosophila Stock Center	BDSC: 27649
D. melanogaster:C15-RNAI #2	Bloomington Drosophila Stock Center	BDSC: 35018
D. melanogaster:GH146-GAL4	(Stacker etal., 1997)	BDSC: 30026
D. melanogaster: UAS-STOP-mCDSGFP	Potter etal., 2010	BDSC: 30125
D. melanogaster: UAS-STOP-mCDSGFP	(Hong etal., 2012)	BDSC: 41573
D. melanogaster: 91G04-GAL4	(Jenett etal., 2012)	BDSC: 40588
D. melanogaster: trol-GAL4	Kyoto Stock Center	DGRC: 113584
D. melanogaster: GH146-Flp	(Hong etal., 2009)	N/A
D. melanogaster: unpg-lacZ	(CuiandDoe, 1995)	N/A
D. melanogaster: UAS-C15	(Campbell, 2005)	N/A
D. melanogaster: nos-Cas9	Diao etal., 2015	N/A
D. melanogaster: CG31676-GAL4	This paper	N/A

Oligonucleotides
Actin5C (qPCR forward primer): 5’-CTCGCCACTTGCGTTTACAGT-3’	This paper	N/A
Actin5C (qPCR reverse primer): 5’-TCCATATCGTCCCAGTTGGTC−−3’	This paper	N/A
C15 (qPCR forward primer): 5’- AGCGCTTCCACAAGCAAAAG-3’	This paper	N/A
C15 (qPCR reverse primer): 5’- CCGTCTGTCGTCTCCACTTG-3’	This paper	N/A

Recombinant DNA
Plasmid: pT-GEM(l)	Addgene	62893
Plasmid: _PU6-BbsI-ChiRNA	Addgene	45946
Plasmid: TOPO-CG31676-T2A-GAL4	This paper	N/A


Software and Algorithms
Custom analysis software	This paper	https://github.com/felixhorns/FlyPN
Iterative Clustering for Identifying Markers	This paper	https://github.com/felixhorns/FlyPN



Other

Open in a new tab

Supplementary Material

Table S1

Table S1. Differentially Expressed Genes among Mz19+ PNs Across Developmental Stages, related to Figure 6.Top 48 most significantly differentially expressed genes between VA1d and DC3 PNs, and between VA1d and DA1 PNs, ateach developmental stage are shown. Columns labeled VA1d, DC3, and DA1 indicate mean expression of the gene within Mz19+ PN cells of the given identified class at the indicated developmental stage. Significance values are given by the Mann-Whitney U test, two sided, and adjusted by the Bonferroni correction. VA1d and DC3 cells are transcriptionally indistinguishable and therefore were analyzed together at the adult stage.

NIHMS982283-supplement-Table_S1.xlsx^{(97.8KB, xlsx)}

NIHMS982283-supplement-1.pdf^{(2MB, pdf)}

Acknowledgments

We thank L. Tan and S.L. Zipursky for sharing detailed protocols of cell dissociation; G. Rubin, G. Campbell, A. Moore, B. White, Bloomington and Kyoto Stock Centers for reagents; S. Darmanis and J. Lui for discussions; N. Neff and J. Okamoto for assistance with sequencing; and T. Clandinin, J. Lui, D. Pederick, K. Shen, and A. Shuster for comments on the manuscript. H.L. is a Stanford Neuroscience Institute Interdisciplinary Postdoctoral Scholar, F.H. acknowledges support from the National Science Foundation Graduate Research Fellowship, S.R.Q. is a Chan Zuckerberg Investigator, and L.L. is an HHMI Investigator. This work was supported by NIH grant R01-DC005982 (to L.L.).

References

Anders S, Pyl PT, and Huber W (2015). HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
Campbell G (2005). Regulation of gene expression in the distal region of the Drosophila leg by the Hox11 homolog, C15. Dev Biol 278, 607–618. [DOI] [PubMed] [Google Scholar]
Campello RJGB, Moulavi D, Zimek A, and Sander J (2013). A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery 27, 344–371. [Google Scholar]
Cui X, and Doe CQ (1995). The role of the cell cycle and cytokinesis in regulating neuroblast sublineage gene expression in the Drosophila CNS. Development 121, 3233–3243. [DOI] [PubMed] [Google Scholar]
Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Hayden Gephart MG, Barres BA, and Quake SR (2015). A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci U S A 112, 7285–7290. [DOI] [PMC free article] [PubMed] [Google Scholar]
Diao F, Ironfield H, Luan H, Diao F, Shropshire WC, Ewer J, Marr E, Potter CJ, Landgraf M, and White BH (2015). Plug-and-play genetic access to drosophila cell types using exchangeable exon cassettes. Cell Rep 10, 1410–1421. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dickson BJ (2008). Wired for sex: the neurobiology of Drosophila mating decisions. Science 322, 904–909. [DOI] [PubMed] [Google Scholar]
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Doherty J, Logan MA, Tasdemir OE, and Freeman MR (2009). Ensheathing glia function as phagocytes in the adult Drosophila brain. J Neurosci 29, 4768–4781. [DOI] [PMC free article] [PubMed] [Google Scholar]
Foldy C, Darmanis S, Aoto J, Malenka RC, Quake SR, and Sudhof TC (2016). Single-cell RNAseq reveals cell adhesion molecule profiles in electrophysiologically defined neurons. Proc Natl Acad Sci U S A 113, E5222–5231. [DOI] [PMC free article] [PubMed] [Google Scholar]
Fuzik J, Zeisel A, Mate Z, Calvigioni D, Yanagawa Y, Szabo G, Linnarsson S, and Harkany T (2016). Integration of electrophysiological recordings with single-cell RNA-seq data identifies neuronal subtypes. Nat Biotechnol 34, 175–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O’Shea EK, and Weissman JS (2003). Global analysis of protein expression in yeast. Nature 425, 737–741. [DOI] [PubMed] [Google Scholar]
Gokce O, Stanley G, Treutlein B, Neff NF, Camp GJ, Malenka RC, Rothwell PE, Fuccillo MV, Sudhof TC, and Quake SR (2016). Cellular Taxonomy of the Mouse Striatum as Revealed by Single-Cell RNA-Seq. Cell Rep 16, 1126–1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hong W, and Luo L (2014). Genetic control of wiring specificity in the fly olfactory system. Genetics 196, 17–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hong W, Mosca TJ, and Luo L (2012). Teneurins instruct synaptic partner matching in an olfactory map. Nature 484, 201–207. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hong W, Zhu H, Potter CJ, Barsh G, Kurusu M, Zinn K, and Luo L (2009). Leucine-rich repeat transmembrane proteins instruct discrete dendrite targeting in an olfactory map. Nat Neurosci 12, 1542–1550. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jefferis GS, Marin EC, Stocker RF, and Luo L (2001). Target neuron prespecification in the olfactory map of Drosophila. Nature 414, 204–208. [DOI] [PubMed] [Google Scholar]
Jefferis GS, Potter CJ, Chan AM, Marin EC, Rohlfing T, Maurer CR Jr., and Luo L (2007). Comprehensive maps of Drosophila higher olfactory centers: spatially segregated fruit and pheromone representation. Cell 128, 1187–1203. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jefferis GS, Vyas RM, Berdnik D, Ramaekers A, Stocker RF, Tanaka NK, Ito K, and Luo L (2004). Developmental origin of wiring specificity in the olfactory system of Drosophila. Development 131, 117–130. [DOI] [PubMed] [Google Scholar]
Jenett A, Rubin GM, Ngo TT, Shepherd D, Murphy C, Dionne H, Pfeiffer BD, Cavallaro A, Hall D, Jeter J, et al. (2012). A GAL4-driver line resource for Drosophila neurobiology. Cell Rep 2, 991–1001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jinushi-Nakao S, Arvind R, Amikura R, Kinameri E, Liu AW, and Moore AW (2007). Knot/Collier and cut control different aspects of dendrite cytoskeleton and synergize to define final arbor shape. Neuron 56, 963–978. [DOI] [PubMed] [Google Scholar]
Johnson MB, and Walsh CA (2017). Cerebral cortical neuron diversity and development at single-cell resolution. Curr Opin Neurobiol 42, 9–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
Johnson MB, Wang PP, Atabay KD, Murphy EA, Doan RN, Hecht JL, and Walsh CA (2015). Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat Neurosci 18, 637–646. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kazama H, and Wilson RI (2009). Origins of correlated activity in an olfactory circuit. Nat Neurosci 12, 1136–1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
Komiyama T, Johnson WA, Luo L, and Jefferis GS (2003). From lineage to wiring specificity. POU domain transcription factors control precise connections of Drosophila olfactory projection neurons. Cell 112, 157–167. [DOI] [PubMed] [Google Scholar]
Komiyama T, and Luo L (2007). Intrinsic control of precise dendritic targeting by an ensemble of transcription factors. Curr Biol 17, 278–285. [DOI] [PubMed] [Google Scholar]
Kurusu M, Cording A, Taniguchi M, Menon K, Suzuki E, and Zinn K (2008). A screen of cell-surface molecules identifies leucine-rich repeat proteins as key mediators of synaptic target selection. Neuron 59, 972–985. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kwak N, and Choi CH (2002). Input feature selection for classification problems. IEEE Trans Neural Netw 13, 143–159. [DOI] [PubMed] [Google Scholar]
Laissue PP, Reiter C, Hiesinger PR, Halter S, Fischbach KF, and Stocker RF (1999). Three-dimensional reconstruction of the antennal lobe in Drosophila melanogaster. J Comp Neurol 405, 543–552. [PubMed] [Google Scholar]
Lee T, and Luo L (1999). Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron 22, 451–461. [DOI] [PubMed] [Google Scholar]
Liang L, Li Y, Potter CJ, Yizhar O, Deisseroth K, Tsien RW, and Luo L (2013). GABAergic projection neurons route selective olfactory inputs to specific higher-order neurons. Neuron 79, 917–931. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lin S, Kao CF, Yu HH, Huang Y, and Lee T (2012). Lineage analysis of Drosophila lateral antennal lobe neurons reveals notch-dependent binary temporal fate decisions. PLoS Biol 10, e1001425. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marin EC, Jefferis GS, Komiyama T, Zhu H, and Luo L (2002). Representation of the glomerular olfactory map in the Drosophila brain. Cell 109, 243–255. [DOI] [PubMed] [Google Scholar]
Meller VH, Wu KH, Roman G, Kuroda MI, and Davis RL (1997). roX1 RNA paints the X chromosome of male Drosophila and is regulated by the dosage compensation system. Cell 88, 445–457. [DOI] [PubMed] [Google Scholar]
Mosca TJ, and Luo L (2014). Synaptic organization of the Drosophila antennal lobe and its regulation by the Teneurins. Elife 3, e03726. [DOI] [PMC free article] [PubMed] [Google Scholar]
Picelli S, Faridani OR, Bjorklund AK, Winberg G, Sagasser S, and Sandberg R (2014). Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9, 171–181. [DOI] [PubMed] [Google Scholar]
Potter CJ, Tasic B, Russler EV, Liang L, and Luo L (2010). The Q system: a repressible binary system for transgene expression, lineage tracing, and mosaic analysis. Cell 141, 536–548. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shannon CE (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27, 623–656. [Google Scholar]
Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, Adiconis X, Levin JZ, Nemesh J, Goldman M, et al. (2016). Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics. Cell 166, 1308–1323 e1330. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sinakevitch I, Grau Y, Strausfeld NJ, and Birman S (2010). Dynamics of glutamatergic signaling in the mushroom body of young adult Drosophila. Neural Dev 5, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stocker RF, Heimbeck G, Gendre N, and de Belle JS (1997). Neuroblast ablation in Drosophila P[GAL4] lines reveals origins of olfactory interneurons. J Neurobiol 32, 443–456. [DOI] [PubMed] [Google Scholar]
Stork T, Sheehan A, Tasdemir-Yilmaz OE, and Freeman MR (2014). Neuron-glia interactions through the Heartless FGF receptor signaling pathway mediate morphogenesis of Drosophila astrocytes. Neuron 83, 388–403. [DOI] [PMC free article] [PubMed] [Google Scholar]
Supek F, Bosnjak M, Skunca N, and Smuc T (2011). REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6, e21800. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tan L, Zhang KX, Pecot MY, Nagarkar-Jaiswal S, Lee PT, Takemura SY, McEwen JM, Nern A, Xu S, Tadros W, et al. (2015). Ig Superfamily Ligand and Receptor Pairs Expressed in Synaptic Partners in Drosophila. Cell 163, 1756–1769. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T, et al. (2016). Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19, 335–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tobin WF, Wilson RI, and Lee WA (2017). Wiring variations that enable and constrain neural computation in a sensory microcircuit. Elife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Usoskin D, Furlan A, Islam S, Abdo H, Lonnerberg P, Lou D, Hjerling-Leffler J, Haeggstrom J, Kharchenko O, Kharchenko PV, et al. (2015). Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18, 145–153. [DOI] [PubMed] [Google Scholar]
van der Maaten L, and Hinton G (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579–2605. [Google Scholar]
Vosshall LB, and Stocker RF (2007). Molecular architecture of smell and taste in Drosophila. Annu Rev Neurosci 30, 505–533. [DOI] [PubMed] [Google Scholar]
Ward A, Hong W, Favaloro V, and Luo L (2015). Toll receptors instruct axon and dendrite targeting and participate in synaptic partner matching in a Drosophila olfactory circuit. Neuron 85, 1013–1028. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wilson RI (2013). Early olfactory processing in Drosophila: mechanisms and principles. Annu Rev Neurosci 36, 217–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wu JS, and Luo L (2006). A protocol for dissecting Drosophila melanogaster brains for live imaging or immunostaining. Nat Protoc 1, 2110–2115. [DOI] [PubMed] [Google Scholar]
Yu HH, Kao CF, He Y, Ding P, Kao JC, and Lee T (2010). A complete developmental sequence of a Drosophila neuronal lineage as revealed by twin-spot MARCM. PLoS Biol 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zeisel A, Munoz-Manchado AB, Codeluppi S, Lonnerberg P, La Manno G, Jureus A, Marques S, Munguba H, He L, Betsholtz C, et al. (2015). Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1

NIHMS982283-supplement-Table_S1.xlsx^{(97.8KB, xlsx)}

NIHMS982283-supplement-1.pdf^{(2MB, pdf)}

[R1] Anders S, Pyl PT, and Huber W (2015). HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] Campbell G (2005). Regulation of gene expression in the distal region of the Drosophila leg by the Hox11 homolog, C15. Dev Biol 278, 607–618. [DOI] [PubMed] [Google Scholar]

[R3] Campello RJGB, Moulavi D, Zimek A, and Sander J (2013). A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery 27, 344–371. [Google Scholar]

[R4] Cui X, and Doe CQ (1995). The role of the cell cycle and cytokinesis in regulating neuroblast sublineage gene expression in the Drosophila CNS. Development 121, 3233–3243. [DOI] [PubMed] [Google Scholar]

[R5] Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Hayden Gephart MG, Barres BA, and Quake SR (2015). A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci U S A 112, 7285–7290. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Diao F, Ironfield H, Luan H, Diao F, Shropshire WC, Ewer J, Marr E, Potter CJ, Landgraf M, and White BH (2015). Plug-and-play genetic access to drosophila cell types using exchangeable exon cassettes. Cell Rep 10, 1410–1421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] Dickson BJ (2008). Wired for sex: the neurobiology of Drosophila mating decisions. Science 322, 904–909. [DOI] [PubMed] [Google Scholar]

[R8] Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, and Gingeras TR (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] Doherty J, Logan MA, Tasdemir OE, and Freeman MR (2009). Ensheathing glia function as phagocytes in the adult Drosophila brain. J Neurosci 29, 4768–4781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] Foldy C, Darmanis S, Aoto J, Malenka RC, Quake SR, and Sudhof TC (2016). Single-cell RNAseq reveals cell adhesion molecule profiles in electrophysiologically defined neurons. Proc Natl Acad Sci U S A 113, E5222–5231. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] Fuzik J, Zeisel A, Mate Z, Calvigioni D, Yanagawa Y, Szabo G, Linnarsson S, and Harkany T (2016). Integration of electrophysiological recordings with single-cell RNA-seq data identifies neuronal subtypes. Nat Biotechnol 34, 175–183. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O’Shea EK, and Weissman JS (2003). Global analysis of protein expression in yeast. Nature 425, 737–741. [DOI] [PubMed] [Google Scholar]

[R13] Gokce O, Stanley G, Treutlein B, Neff NF, Camp GJ, Malenka RC, Rothwell PE, Fuccillo MV, Sudhof TC, and Quake SR (2016). Cellular Taxonomy of the Mouse Striatum as Revealed by Single-Cell RNA-Seq. Cell Rep 16, 1126–1137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] Hong W, and Luo L (2014). Genetic control of wiring specificity in the fly olfactory system. Genetics 196, 17–29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Hong W, Mosca TJ, and Luo L (2012). Teneurins instruct synaptic partner matching in an olfactory map. Nature 484, 201–207. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] Hong W, Zhu H, Potter CJ, Barsh G, Kurusu M, Zinn K, and Luo L (2009). Leucine-rich repeat transmembrane proteins instruct discrete dendrite targeting in an olfactory map. Nat Neurosci 12, 1542–1550. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] Jefferis GS, Marin EC, Stocker RF, and Luo L (2001). Target neuron prespecification in the olfactory map of Drosophila. Nature 414, 204–208. [DOI] [PubMed] [Google Scholar]

[R18] Jefferis GS, Potter CJ, Chan AM, Marin EC, Rohlfing T, Maurer CR Jr., and Luo L (2007). Comprehensive maps of Drosophila higher olfactory centers: spatially segregated fruit and pheromone representation. Cell 128, 1187–1203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Jefferis GS, Vyas RM, Berdnik D, Ramaekers A, Stocker RF, Tanaka NK, Ito K, and Luo L (2004). Developmental origin of wiring specificity in the olfactory system of Drosophila. Development 131, 117–130. [DOI] [PubMed] [Google Scholar]

[R20] Jenett A, Rubin GM, Ngo TT, Shepherd D, Murphy C, Dionne H, Pfeiffer BD, Cavallaro A, Hall D, Jeter J, et al. (2012). A GAL4-driver line resource for Drosophila neurobiology. Cell Rep 2, 991–1001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] Jinushi-Nakao S, Arvind R, Amikura R, Kinameri E, Liu AW, and Moore AW (2007). Knot/Collier and cut control different aspects of dendrite cytoskeleton and synergize to define final arbor shape. Neuron 56, 963–978. [DOI] [PubMed] [Google Scholar]

[R22] Johnson MB, and Walsh CA (2017). Cerebral cortical neuron diversity and development at single-cell resolution. Curr Opin Neurobiol 42, 9–16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] Johnson MB, Wang PP, Atabay KD, Murphy EA, Doan RN, Hecht JL, and Walsh CA (2015). Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex. Nat Neurosci 18, 637–646. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] Kazama H, and Wilson RI (2009). Origins of correlated activity in an olfactory circuit. Nat Neurosci 12, 1136–1144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] Komiyama T, Johnson WA, Luo L, and Jefferis GS (2003). From lineage to wiring specificity. POU domain transcription factors control precise connections of Drosophila olfactory projection neurons. Cell 112, 157–167. [DOI] [PubMed] [Google Scholar]

[R26] Komiyama T, and Luo L (2007). Intrinsic control of precise dendritic targeting by an ensemble of transcription factors. Curr Biol 17, 278–285. [DOI] [PubMed] [Google Scholar]

[R27] Kurusu M, Cording A, Taniguchi M, Menon K, Suzuki E, and Zinn K (2008). A screen of cell-surface molecules identifies leucine-rich repeat proteins as key mediators of synaptic target selection. Neuron 59, 972–985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] Kwak N, and Choi CH (2002). Input feature selection for classification problems. IEEE Trans Neural Netw 13, 143–159. [DOI] [PubMed] [Google Scholar]

[R29] Laissue PP, Reiter C, Hiesinger PR, Halter S, Fischbach KF, and Stocker RF (1999). Three-dimensional reconstruction of the antennal lobe in Drosophila melanogaster. J Comp Neurol 405, 543–552. [PubMed] [Google Scholar]

[R30] Lee T, and Luo L (1999). Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron 22, 451–461. [DOI] [PubMed] [Google Scholar]

[R31] Liang L, Li Y, Potter CJ, Yizhar O, Deisseroth K, Tsien RW, and Luo L (2013). GABAergic projection neurons route selective olfactory inputs to specific higher-order neurons. Neuron 79, 917–931. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Lin S, Kao CF, Yu HH, Huang Y, and Lee T (2012). Lineage analysis of Drosophila lateral antennal lobe neurons reveals notch-dependent binary temporal fate decisions. PLoS Biol 10, e1001425. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] Marin EC, Jefferis GS, Komiyama T, Zhu H, and Luo L (2002). Representation of the glomerular olfactory map in the Drosophila brain. Cell 109, 243–255. [DOI] [PubMed] [Google Scholar]

[R34] Meller VH, Wu KH, Roman G, Kuroda MI, and Davis RL (1997). roX1 RNA paints the X chromosome of male Drosophila and is regulated by the dosage compensation system. Cell 88, 445–457. [DOI] [PubMed] [Google Scholar]

[R35] Mosca TJ, and Luo L (2014). Synaptic organization of the Drosophila antennal lobe and its regulation by the Teneurins. Elife 3, e03726. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R36] Picelli S, Faridani OR, Bjorklund AK, Winberg G, Sagasser S, and Sandberg R (2014). Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9, 171–181. [DOI] [PubMed] [Google Scholar]

[R37] Potter CJ, Tasic B, Russler EV, Liang L, and Luo L (2010). The Q system: a repressible binary system for transgene expression, lineage tracing, and mosaic analysis. Cell 141, 536–548. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] Shannon CE (1948). A Mathematical Theory of Communication. Bell System Technical Journal 27, 623–656. [Google Scholar]

[R39] Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, Adiconis X, Levin JZ, Nemesh J, Goldman M, et al. (2016). Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics. Cell 166, 1308–1323 e1330. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] Sinakevitch I, Grau Y, Strausfeld NJ, and Birman S (2010). Dynamics of glutamatergic signaling in the mushroom body of young adult Drosophila. Neural Dev 5, 10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] Stocker RF, Heimbeck G, Gendre N, and de Belle JS (1997). Neuroblast ablation in Drosophila P[GAL4] lines reveals origins of olfactory interneurons. J Neurobiol 32, 443–456. [DOI] [PubMed] [Google Scholar]

[R42] Stork T, Sheehan A, Tasdemir-Yilmaz OE, and Freeman MR (2014). Neuron-glia interactions through the Heartless FGF receptor signaling pathway mediate morphogenesis of Drosophila astrocytes. Neuron 83, 388–403. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] Supek F, Bosnjak M, Skunca N, and Smuc T (2011). REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6, e21800. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] Tan L, Zhang KX, Pecot MY, Nagarkar-Jaiswal S, Lee PT, Takemura SY, McEwen JM, Nern A, Xu S, Tadros W, et al. (2015). Ig Superfamily Ligand and Receptor Pairs Expressed in Synaptic Partners in Drosophila. Cell 163, 1756–1769. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T, et al. (2016). Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19, 335–346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R46] Tobin WF, Wilson RI, and Lee WA (2017). Wiring variations that enable and constrain neural computation in a sensory microcircuit. Elife 6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] Usoskin D, Furlan A, Islam S, Abdo H, Lonnerberg P, Lou D, Hjerling-Leffler J, Haeggstrom J, Kharchenko O, Kharchenko PV, et al. (2015). Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18, 145–153. [DOI] [PubMed] [Google Scholar]

[R48] van der Maaten L, and Hinton G (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579–2605. [Google Scholar]

[R49] Vosshall LB, and Stocker RF (2007). Molecular architecture of smell and taste in Drosophila. Annu Rev Neurosci 30, 505–533. [DOI] [PubMed] [Google Scholar]

[R50] Ward A, Hong W, Favaloro V, and Luo L (2015). Toll receptors instruct axon and dendrite targeting and participate in synaptic partner matching in a Drosophila olfactory circuit. Neuron 85, 1013–1028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] Wilson RI (2013). Early olfactory processing in Drosophila: mechanisms and principles. Annu Rev Neurosci 36, 217–241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] Wu JS, and Luo L (2006). A protocol for dissecting Drosophila melanogaster brains for live imaging or immunostaining. Nat Protoc 1, 2110–2115. [DOI] [PubMed] [Google Scholar]

[R53] Yu HH, Kao CF, He Y, Ding P, Kao JC, and Lee T (2010). A complete developmental sequence of a Drosophila neuronal lineage as revealed by twin-spot MARCM. PLoS Biol 8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] Zeisel A, Munoz-Manchado AB, Codeluppi S, Lonnerberg P, La Manno G, Jureus A, Marques S, Munguba H, He L, Betsholtz C, et al. (2015). Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347, 1138–1142. [DOI] [PubMed] [Google Scholar]

PERMALINK

Classifying Drosophila Olfactory Projection Neuron Subtypes by Single-cell RNA Sequencing

Hongjie Li

Felix Horns

Bing Wu

Qijing Xie

Jiefu Li

Tongchao Li

David J Luginbuhl

Stephen R Quake

Liqun Luo

Summary

Introduction

Figure 1. Single-cell RNA-seq Protocol for the Drosophila Pupal Brain.

Results

A Robust Single-cell RNA-seq Protocol for the Drosophila Pupal Brain

Clustering GH146-GAL4+ Projection Neurons (PNs) Based on Single-cell Transcriptomes

Figure 2. Single-cell RNA-seq Analysis of GH146+ PNs.

Matching Clusters to PN Classes Using Known Markers

Figure 3. Mapping Clusters to PN Classes Using Known Markers.

Matching Clusters to PN Classes Using Newly Identified Markers

Figure 4. Mapping Clusters to PN Classes Using Newly Identified Markers.

A New Lineage-specific Transcription Factor Regulates Dendrite Targeting

Figure 5. Identification of New Lineage-specific Transcription Factors using Single-cell RNA-seq.

Transcriptomes of Closely Related PN Classes Exhibit the Largest Differences during Circuit Assembly

Figure 6. Transcriptome Analysis of Mz19+ PNs across Developmental Stages.

PN Subtype Identity Is Encoded by a Combinatorial Molecular Code

Figure 7. Combinatorial Molecular Codes of PN Subtype Identity.

TFs and CSMs Are Highly Informative in Encoding PN Identity and Enriched Among Differentially Expressed Genes

Discussion

STAR Methods

Contact For Reagent And Resource Sharing

Experimental Model And Subject Details

Fly stocks

Method Details

MARCM analysis

Immunostaining

Quantitative PCR (qPCR)

Imaging and quantification procedure

Single-cell RNA-sequencing

Fly brain dissociation

Quantification And Statistical Analysis

Sequence alignment and preprocessing

Dimensionality reduction and clustering

Overdispersion analysis

Iterative Clustering for Identifying Markers (ICIM)

Differential expression analyses

TF and CSM lists

Analysis methods for figures

Single-cell transcriptome analyses of neurons and glia in Figure 1.

Removal of GH146+ vPNs and APL neurons in Figure 2.

Single-cell transcriptome analyses of GH146+ cells in Figure 2.

Mapping clusters to PN classes in Figure 3.

Mapping clusters to PN classes in Figure 4.

Analysis of transcriptome changes during development in Figure 6.

Characterizing genes that distinguish PN subtypes in Figure 7.

Searching for unique marker genes for PN subtypes in Figure 7.

Information theory-based analyses in Figure 7.

Combinatorial coding of PN subtype identity in Figure 7.

Data And Software Availability

Additional Resources

Key Resources Table

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases