Abstract
We used the 10x Genomics Visium platform to define the spatial topography of gene expression in the six-layered human dorsolateral prefrontal cortex (DLPFC). We identified extensive layer-enriched expression signatures, and refined associations to previous laminar markers. We overlaid our laminar expression signatures onto large-scale single nuclei RNA sequencing data, enhancing spatial annotation of expression-driven clusters. By integrating neuropsychiatric disorder gene sets, we showed differential layer-enriched expression of genes associated with schizophrenia and autism spectrum disorder, highlighting the clinical relevance of spatially-defined expression. We then developed a data-driven framework to define unsupervised clusters in spatial transcriptomics data, which can be applied to other tissues or brain regions where morphological architecture is not as well-defined as cortical laminae. We lastly created a web application for the scientific community to explore these raw and summarized data to augment ongoing neuroscience and spatial transcriptomics research (http://research.libd.org/spatialLIBD).
Introduction
The spatial organization of the brain is fundamentally related to its function. This structure-function relationship is especially apparent in the context of the laminar organization of the human cerebral cortex where cells residing within different cortical layers show distinct gene expression patterns and exhibit differing patterns of morphology, physiology, and connectivity 1–4. To the extent that structure entrains function, understanding normal brain development as well as disorders of the central nervous system will require identifying the cell types that make up the brain, and ultimately linking functional correlates of individual cell classes with structural architecture.
Major advances in single-cell (scRNA-seq) and single-nuclei (snRNA-seq) sequencing technologies have dramatically increased identification of molecularly-defined cell types in the human brain and implicated unique cell classes in risk for specific brain disorders 5–11. While scRNA-seq approaches are common in rodent brain tissue, the relatively large size and fragility of human neurons, coupled with the fact that most available postmortem human brain tissue is frozen, has resulted in nearly all available data in the human brain being generated on isolated nuclei with snRNA-seq approaches 12. While nuclear profiles are generally representative of whole cell profiles 13, isolated nuclei lack the cytoplasmic compartment as well as axons and proximal dendrites, which limits our understanding of gene expression in the cytosol and neuropil 12. This is problematic for studies of brain disorders as converging evidence suggests that impairments in the formation or maintenance of synapses in critical cortical microcircuits are involved in many neuropsychiatric and neurodevelopmental disorders, including schizophrenia disorder (SCZD) and autism spectrum disorder (ASD) 6,14,15. Indeed, studies in the postmortem brains of individuals with these disorders have implicated not only specific cell types 6,12,16, but also revealed differences in neuronal and synaptic structure that are spatially localized to specific cortical layers 6,14. Furthermore, genes associated with increased risk for SCZD that were identified by genome-wide association studies (GWAS) are preferentially enriched for synaptic neuropil transcripts 12, suggesting that the extra-nuclear information missed by snRNA-seq approaches may be especially relevant for understanding genetic risk for brain disorders. While molecular profiles derived from sc/sn-RNAseq data can be used to predict anatomical location based on canonical marker genes described in the literature or from curated datasets, precisely assigning gene expression to the spatial coordinates of individual cell populations within intact brain cytoarchitecture of postmortem human brain tissue would significantly advance our understanding of studies of human brain development and disease.
Because it is considered a gold standard for quantifying gene expression with high spatial resolution, we recently established and optimized methods for using multiplex single-molecule fluorescent in situ hybridization (smFISH) in postmortem human brain tissue 17. However, multiplexing with these technologies is limited, and even for methodologies that can accommodate hundreds to thousands of transcripts simultaneously, molecular crowding within cells leads to fluorescence overlap, which introduces significant microscopy-related issues and computational challenges 18,19. The relatively large size of the human brain and lipofuscin-derived autofluorescence pose additional challenges for microscopy-based spatial transcriptomic methods in postmortem human tissue. While methods such as laser capture microdissection (LCM)-seq do allow for transcriptome-wide profiling from cytosol in a spatially-defined area 20–22, the tissue is removed from the surrounding spatial context and processed separately, hindering the ability to analyze gradients of gene expression and examine spatial relationships within intact sections.
Emerging technologies for genome-wide spatial transcriptomics offer significant potential for providing detailed molecular maps that overcome limitations associated with sn/scRNA-seq and microscopy-based spatial transcriptomic methods. Importantly, these technologies use an on-slide cDNA synthesis approach that captures gene expression in the architecture of intact tissue, meaning that information from cytosol and neuronal processes is retained 23,24. To further our understanding of gene expression within the context of the spatial organization of the human cortex, we used the recently-released, 10x Genomics Visium platform, a novel barcoding-based transcriptome-wide spatial transcriptomics technology, to generate spatial maps of gene expression in the six-layered dorsolateral prefrontal cortex (DLPFC) of the adult human brain. The Visium platform expands the spatial resolution 5-fold beyond the first-generation ‘Spatial Transcriptomics’ approach 23 upon which it is based. While the original approach was successfully used to generate gene expression atlases and identify perturbations in transcriptional pathways for several normal and pathological human tissues, including the developing heart 25, invasive ductal cancer 23, pancreatic ductal adenocarcinoma 26, prostate cancer 27, postmortem spinal cord 28 and cerebellum 29 of patients with amyotrophic lateral sclerosis (ALS), it lacked the necessary spatial resolution to resolve both individual cells and laminar structures in the human cortex.
Since some differences in pathology and gene expression associated with neuropsychiatric disorders are localized to specific cortical layers 6,14, the ability to localize spatial gene expression in the human brain at cellular resolution will be critical to gain further insight into disease mechanisms. Towards this end, we sought to define the laminar topography of gene expression in the human DLPFC, a brain area that has been implicated in a number of neuropsychiatric disorders. We overlaid data from recent large-scale snRNA-seq studies in the human brain with our spatial data to first confirm our layer-enriched expression signatures, and to then increase precision in manual annotation of gene expression-driven clusters to cortical laminae. To exemplify the potential of this type of data for clinical translation, we integrated our dataset with various neuropsychiatric disorder gene sets to demonstrate preferential layer-enriched expression of ASD risk genes and layer-enriched association of risk for several neuropsychiatric disorders. Finally, we compared the manually-annotated laminar clusters to entirely data-driven spatial clusters in the same human cortical tissue, using an approach that can also be applied to other human tissues and brain regions that do not have as clear morphological patterning as the cerebral cortex. We provide these data and analysis tools as a significant scientific resource for the neuroscience community to augment current molecular profiling and spatial transcriptomics efforts in the human brain.
Results
We profiled spatial gene expression in human postmortem DLPFC tissue sections from two pairs of ‘spatial replicates’ from three independent neurotypical adult donors. Each pair consisted of two, directly adjacent 10μm serial tissue sections with the second pair located 300μm posterior from the first, resulting in a total of 12 samples run on the Visium platform (Figure 1 A–B, Table S1, Supplementary Figure 1) Method Details: Tissue processing and Visium data generation). We sequenced each sample to a median depth of 291.1M reads (IQR: 269.3M-327.7M), which corresponded to a mean 3,462 unique molecular indices (UMIs) and a mean 1,734 genes per spot. We note these rates are analogous to snRNA-seq and scRNA-seq data using the 10x Genomics Chromium platform, where a ‘cell’ barcode on the Chromium platform corresponds to a ‘spatial’ barcode on the Visium platform. However, unlike snRNA-seq data from postmortem human brain, which contains high numbers of intronic reads that map to immature transcripts, we found strong enrichment of mature mRNAs with high mean rates of exonic alignments (mean: 83.3%, IQR: 82.5-84.3%, Method Details: Visium raw data processing). Independent processing and cell segmentation of high-resolution histology images acquired before on-slide cDNA synthesis indicated an average of 3.3 cells per spot (IQR: 1-4), with a mean 15.0% (IQR: 12.8-17.9%) spots per sample containing a single cell body and 9.7% (IQR: 5.4-12.3%) ‘neuropil’ spots that lacked any cell bodies (Method Details: Histology image processing and segmentation). Tissue sections were acquired in the plane perpendicular to the pial surface that extended to the gray-white matter junction (Figure 1 C). The orientation of each sample was confirmed by delineating the border between layer 6 (L6) and the adjacent white matter (WM) and identifying layer (L5) using marker genes for gray matter/neurons (SNAP25), WM/oligodendrocytes (MOBP), and L5 (PCP4) in each tissue section (Figure 1 D–F, Supplementary Figure 2, Supplementary Figure 3, and Extended Data 1). A summary of the data generated and analyses performed is located in Supplementary Figure 4.
Gene expression in the DLPFC across cortical laminae
We first generated aggregated layer-enriched expression profiles for each spatial replicate using a ‘supervised’ approach to assign individual spots to each of the six neocortical layers or the WM (Supplementary Figure 5, Method Details: Spot-level data processing). Then, we performed ‘pseudo-bulking’ by summing the UMI counts for each gene within each layer across each spatial replicate to generate layer-enriched expression profiles (Figure 2A, Method Details: Layer-level data processing). The pseudo-bulking approach, summarizing 47,681 spots to 76 layer-aggregated profiles across the 12 samples, removed sparsity and greatly increased UMI coverage of genes (Figure 2A). Unsupervised clustering of these layer-enriched expression profiles revealed the top component of variation in the data related to laminar differences, particularly between the white and gray matter (Figure 2B), with high concordance between the pairs of spatial replicates (Extended Data 2). Segmentation of histological images confirmed sparser cell densities in layer 1 (L1), a molecular layer enriched in synaptic processes, with 33.4% and 21.7% of spots containing 0 and 1 cell body, respectively. We observed increased cell densities in the oligodendrocyte-enriched WM, with 3.9% and 5.9% of spots containing 0 and 1 cell body, respectively (Table S2). We hypothesized that these ‘neuropil spots’ with 0 cell bodies may be enriched with neuronal processes (i.e. axons and dendrites; Table S3), and as predicted we identified significant enrichment of genes that are preferentially expressed in the transcriptome of synaptic terminals 30 (ρ=0.38, p=1.9e-30, Extended Data 3) (Method Details: Neuropil enrichment analyses). Together, these analyses demonstrate the power of concurrently acquiring histology and gene expression data and highlight the ability of the Visium platform to achieve high resolution spatial expression profiling within the human DLPFC.
We used three strategies to perform differential expression (DE) analyses using the layer-enriched expression profiles generated above with linear mixed-effects modeling (Extended Data 4, Method Details: Layer-level gene modeling). The first strategy involved testing for differences in mean expression across the six layers plus WM (we also tested for differences in mean expression with only six layers, excluding WM), termed the ‘ANOVA’ model (Figure 2 C), which estimates an F-statistic for each gene. This strategy revealed extensive differential expression across the laminar organization of the DLPFC, with 10,633 (47.6%) DE genes (DEGs) across the six gray matter layers plus WM (at FDR < 0.05) and 8,581 (38.4%) DEGs across the six gray matter layers excluding WM (FDR < 0.05). As expected, these results suggested extensive differences in gene expression between the layers of the DLPFC beyond broad white versus gray matter comparisons. The second strategy identified layer-enriched genes by testing for differences in expression between one layer versus all other layers, termed the ‘enrichment’ model (Figure 2 D), which resulted in a t-statistic (termed ‘layer-enriched statistics’ hereafter) and p-value (and corresponding FDR adjusted q-value) for each expressed gene and layer (Method Details: Layer-level gene modeling). The largest expression differences were between WM and the neocortical layers, with 9,124 DEGs (FDR < 0.05), and the smallest differences were between L3 and all other layers with 183 DEGs genes (Table S4). In the third strategy, we tested for genes differentially expressed between each pair of layers (21 pairs), termed the ‘pairwise’ model (Figure 2 E, Method Details: Layer-level gene modeling), which produced significant DEGs ranging from 8,500 for WM versus L3 to 292 for L4 versus L5 (Table S4). Together, these analyses highlight the extensive gene expression differences between the different layers of the human adult DLPFC.
Identifying novel layer-enriched genes in human cortex
Several resources have compiled genes that exhibit laminar-specific expression across both rodent 31 and human cortex 32. While both overlapping and unique marker genes have been identified, these studies used different technologies, examined different developmental stages, and queried different regions of cortex. Therefore, we systematically assessed the robustness of these previously identified marker genes in our human adult DLPFC layer-enriched gene expression dataset. First, we tested for enrichment of previously published layer-enriched genes - as a set - among our layer-enriched DEGs, and found strong enrichment (p=1.22e-41). Since many of these marker genes were previously annotated to multiple layers (i.e. CCK and ENC1, Figure 3), rather than a single layer as queried in our DE analyses, we fit the ‘optimal’ statistical model for each gene using our layer-enriched expression profiles (Method Details: Known marker genes optimal modeling, Table S5). For example, CCK was annotated to L2, L3, and L6, which were together tested against combining L1, L3, L4, and WM in this optimal model. Only a subset of previously-associated layer-enriched genes showed high ranks and significant differential expression in our human DLPFC data (Extended Data 5), which were largely driven by markers identified by Zeng et al. 32.
We further confirmed laminar enrichment of a number of canonical marker genes, including CCK, ENC1, CUX2, RORB, and NTNG2, and validated these findings against publicly available singleplex in situ hybridization data from the Allen Brain Institute’s Human Brain Atlas 33 (Figure 3 and Extended Data 6). Interestingly, while many of these genes (FABP7, ADCYAP1, PVALB) showed layer-enriched expression in our data, they were not classified by the Allen Brain Institute resources as being layer markers, demonstrating the utility of quantitative transcriptome-scale spatial approaches. Although we confirmed several canonical layer-enriched/specific genes, we found that only 59.5% of previously identified marker genes were significant DEGs (FDR < 0.05) in human DLPFC (Table S5). Indeed, we identified several genes previously underappreciated as laminar markers in human DLPFC, including AQP4 (L1), HPCAL1 (L2), FREM3 (L3), TRABD2A (L5) and KRT17 (L6) (Figure 4 and Extended Data 7). We validated these novel layer-enriched DEGs using multiplex single molecule fluorescent in situ hybridization (Figure 4 and Supplementary Figure 6, Methods Details: RNAscope smFISH). Novel layer-enriched DEGs were also validated by multiplexing with previously identified layer markers in the literature, many of which were also replicated in our Visium data (Supplementary Figure 7).
Spatial registration of single nuclei RNA sequencing (snRNA-seq)
Adding spatial resolution to snRNA-seq datasets generated from human brain tissue has the potential to provide further insights about the function of molecularly-defined cell types. Specifically, layer-enriched expression profiles and differential expression statistics derived from the ‘enrichment model’ in our Visium data can be used to spatially “register” snRNA-seq datasets and add layer-enriched information to data-driven expression clusters that do not contain inherent anatomical information (Figure 5 A, Methods Details: snRNA-seq spatial registration). We first used snRNA-seq data from Hodge et al. 5 to confirm our layer-enriched expression profiles and validate this spatial registration strategy. While the snRNA-seq data in that study was obtained predominantly from NeuN+ sorted neuronal nuclei that were isolated from manually-dissected layers of the human postmortem middle temporal gyrus cortex, our layer-enriched DEGs from spatially-barcoded bulk tissue sections were in agreement with the laminar assignments from which these nuclei were derived (Figure 5 B). We further validated this strategy on bulk RNA-seq data that was generated from manually-dissected laminar serial sections of the human prefrontal cortex from four donors 22. This data however lacked corresponding histology data to definitively annotate specific cortical layers, and assignment of sections to layers likely underestimated the amount of WM present (~5 sections/sample instead of just one predicted section), and missed L1 in one of their four subjects (H1) (Supplementary Figure 8).
We then used our layer-enriched statistics to perform spatial registration across three independent snRNA-seq datasets from human cortex. First, we generated our own snRNA-seq data from DLPFC using 5,231 nuclei from two donors, and performed data driven clustering to generate 30 preliminary cell clusters across 7 broad cell types (Supplementary Figure 9, Method Details: DLPFC snRNA-seq data generation). Integration of our layer-enriched statistics refined excitatory and inhibitory neuronal subclasses into upper and deep layer subgroups beyond expected enrichments of glial cells in the WM (Extended Data 8 A). We further assessed the robustness of this approach by re-analyzing processed snRNA-seq from 48 donors across 70,634 nuclei obtained from the human prefrontal cortex (BA10) across 44 broad clusters in a study of Alzheimer’s disease 10. Glial cell subpopulations showed expected enrichments, with preferential expression of oligodendrocyte subtypes in the WM, astrocyte subtypes in L1, and microglia, oligodendrocyte precursor (OPC), pericytes, and endothelial subtypes in both L1 and WM (Figure 5 C). Neuronal cell subtypes showed greater laminar diversity, with multiple excitatory and inhibitory neuronal cell types associating with L2/L3, L4, L5, and L6 preferential expression, with generally more layer-enriched expression within excitatory cells (Figure 5 C). Interestingly, our analysis showed that the excitatory neuronal subclasses (Ex2, Ex4, Ex6) identified by Mathys et al. that were most associated with clinical traits of Alzheimer’s disease were preferentially localized to the upper layers (L2/L3) of DLPFC in our data. This finding contrasts the inferences that were drawn by Mathys et al., which made layer assignments based on data obtained from the serial sections in He et al. described above 22. Specifically, they concluded that excitatory neuronal subclass Ex4 and Ex6 were preferentially expressed in the deeper layers while excitatory neuronal subclass Ex2 showed no laminar enrichment.
We lastly applied our spatial registration analysis to a study of autism spectrum disorder (ASD) 6 including snRNA-seq data from 104,559 nuclei isolated from the human prefrontal cortex and anterior cingulate cortex that were obtained from 41 samples across 31 donors, which were annotated to 17 clusters in a study of ASD 6 (Extended Data 8 B). As expected, we confirmed expected spatial contexts; for example, the highest enrichment of oligodendrocytes was again found in our histologically-defined WM. Our spatial registration framework was also able to refine the laminar predictions of cell-types in these previous studies. For example, integration of layer-enriched genes defined by Visium with snRNA-seq data from Velmeshev et al. indicated that astrocyte populations were most enriched in L1, while excitatory neurons annotated to L4 were more likely to be found in L5. These analyses demonstrate how this ‘spatial registration’ framework can be readily applied to any existing snRNA-seq or scRNA-seq datasets from dissociated cells to add back anatomical information.
Clinical relevance of layer-enriched gene expression profiling
Given that several studies have identified associations between different brain disorders and molecularly-defined cell types, we assessed the clinical relevance of spatial gene expression using several different brain disorder-associated gene sets. We assessed the laminar enrichment of (1) gene sets derived from genes linked to different disorders via DNA profiling, (2) genes differentially expressed in postmortem brains of patients with a variety of brain disorders and neurotypical controls, and (3) genes associated with genetic risk via transcriptome-wide association studies (TWAS). We first used broad gene sets for different brain disorders compiled by Birnbaum et al. 34, which showed laminar enrichments specifically for ASD (Supplementary Figure 10, Table S6, Method Details: Clinical gene set enrichment analyses). We used the latest SFARI Gene database 35 to refine these associations, and demonstrate enrichments of L2 (OR=2.74, p=6.0e-21), L5 (OR=2.1, p=8.7e-7) and L6 (OR=2.7, p=1.8e-7) with ASD risk genes (Figure 6 A). We confirmed the L2 (OR=3.6, p=3.9e-6) and L5 (OR=4.0, p=6.7e-5) associations in a recent exome sequencing study by Satterstrom et al. 36, which identified 102 genes with ASD-associated variants. Interestingly, stratifying these genes by their clinical symptoms refined the laminar enrichments, as the 53 genes associated with ASD-dominant traits were more enriched for L5 (OR=4.9, p=5.3e-4, 8 genes: TBR1, SATB1, ANK2, RORB, MKX, CELF4, PPP5C, AP2S1), whereas the 49 genes associated with neurodevelopmental delay were more enriched for L2 (OR=4.5, p=7.8e-5, 12 genes: CACNA1E, MYT1L, SCN2A, TBL1XR1, NR3C2, SYNGAP1, GRIN2B, IRF2BPL, GABRB3, RAI1, TCF4, ADNP), suggesting that different functional subclasses of neurons might be contributing to each clinical subgroup. These layer-enriched expression associations for risk genes were largely independent of the enrichments seen comparing genes more highly expressed (WM: p=1.9e-29 and L1: p=4.5e-61) or more lowly expressed (L3: p=2.9e-5, L4: p=1.7e-42, L5: p=3.2e-36, and L6: p=1.9e-7) in brains of ASD patients compared to neurotypical controls (Table S6).
We further assessed laminar enrichment of genes proximal to common genetic variation associated with SCZD, ASD, bipolar disorder (BPD), and major depressive disorder (MDD) 37. These analyses identified significant overlap between L2-enriched and L5-enriched genes and risk for SCZD (at Bonferroni < 0.05), with additional overlap between L2-enriched genes and risk for bipolar disorder (at FDR < 0.05, Table S7). As above with ASD, there were markedly different laminar enrichments for genes associated with SCZD illness state. Enrichment analyses of DEGs identified in two large SCZD postmortem brain datasets 16,38, while highly convergent across studies, showed extensive enrichment across all layers, with increased expression of L1, L2, and L3 genes and decreased expression of WM, L4, L5 and L6 genes in patients compared to controls (Figure 6 B). As secondary analyses, we performed heritability partitioning analysis 39 for layer-enriched gene sets, which again identified significant heritability enrichment exclusively for L2 enriched-genes, specifically for SCZD, BPD, and educational attainment (Table S8, Method Details: Clinical gene set enrichment analyses). We additionally assessed TWAS statistics constructed for SCZD and BPD from single nucleotide polymorphism (SNP) weights computed from DLPFC 16,20. While we did not observe strong enrichments of TWAS signal for any layer-enriched gene expression, SCZD risk genes in L2 and L5 suggested decreased expression in illness (Figure 6 B, Table S6). Together, these analyses highlight the potential utility of these data in gleaning clinical insights by incorporating layer-enriched gene expression of the adult DLPFC into the interpretation of risk genes.
Data-driven layer-enriched clustering in the DLPFC
Lastly, we explored the use of three alternative ‘data-driven’ approaches to classify Visium spots into laminar and non-laminar patterns, in contrast to the ‘supervised’ approach of identifying layer-enriched DEGs from manually-annotation of layers based on cytoarchitecture (Figure 7 A, B; Supplementary Figure 11), which may not be feasible in other brain regions or human tissues that lack clear or established morphological boundaries. Towards this goal, we explored the use of two gene sets: (1) genes exhibiting spatially variable expression patterns (SVGs) within each of the 12 samples (Table S9), and (2) highly variable genes (HVGs; Method Details: Data-driven layer-enriched clustering analysis). While no laminar information was used to identify SVGs and HVGs, interestingly these gene sets could identify both laminar and non-laminar spatial patterns (Figure 7 C, D). For example, we identified several SVGs that were non-laminar, including HBB, IGKC, and NPY, which likely relate to blood cells, immune cells, and inhibitory interneuron classes (Figure 7 D). In a completely data-driven and ‘unsupervised’ approach, we then used several implementations of unsupervised clustering methods with spot-level Visium data using these gene sets, with the possibility of further incorporating spatial coordinates of the spots, since we reasoned that adjacent spots should tend to show more similar expression levels (Figure 7 E, Extended Data 9, Extended Data 10; Method Details: Data-driven layer-enriched clustering analysis). We compared these results to a ‘semi-supervised’ approach (unsupervised clustering guided by the layer-enriched genes identified using the DE “enrichment” models; Extended Data 4) and an approach using known rodent and human layer marker genes from Zeng et al. 32 (Figure 7 E, and Table S10).
Using the manually-annotated layers as a ‘gold standard’ (Figure 7 A, Supplementary Figure 11), we evaluated the performance of the three approaches (‘unsupervised’, ‘semi-supervised’ and ‘markers’) using the adjusted Rand index (ARI) as the performance metric. Specifically, the ARI measures the similarity between the predicted cluster labels from our three approaches and the ‘gold standard’ cluster labels, with higher values corresponding to better performance (Figure 7 F). First, we found consistent, but moderate, performance improvements by incorporating x, y spatial coordinates of the spots into the clustering methods across all three approaches (Figure 7 F). Within the ‘unsupervised’ approach, we found that using the HVGs resulted in the highest ARI, but with the SVGs also comparable in performance (Figure 7 F). However, the ‘semi-supervised’ approach resulted in the highest ARI out of all three approaches. This likely stems from the circularity of performing data-driven clustering guided by our layer-enriched DEGs on the same data, but this could be powerful in future spatial transcriptomics studies in the human cortex.
Discussion
Based on examination of its histological organization and cytoarchitecture, the neocortex can be divided into six layers, which can be differentiated based on cell type composition and density, as well as morphology and connectivity of resident cell types 1–4. Studies of postmortem brains from individuals with neuropsychiatric disorders have identified disease-associated changes in gene expression and synaptic structure that can be localized to individual layers 6,14. Our study, which is the first to our knowledge, to implement Visium technology in human brain tissue provides a number of important functional insights about the spatial and molecular definitions of cell populations across cortical laminae by analyzing gene expression within the intact spatial organization of the human DLPFC.
First, we demonstrated the potential clinical translation of quantifying layer-enriched expression profiles in human DLPFCs. By integrating our data with clinical gene sets and genes differentially expressed in the brains of individuals with various neuropsychiatric disorders, we demonstrated preferential layer-enriched expression of genes implicated in ASD and SCZD. For example, genes that harbor de novo mutations associated with ASD 36 were preferentially expressed in L2 and L5 subsets of these genes that were associated with specific clinical characteristics could be further partitioned into specific laminae. Specifically, genes predominantly associated with neurodevelopmental delay (NDD) were preferentially expressed in L2 and genes predominantly associated with ASD were preferentially expressed in L5. These specific laminar associations with penetrant de novo variants were in contrast to broad laminar enrichments of genes differentially expressed in the brains of patients with ASD 16 and lack of laminar enrichment of genes implicated by common genetic variation 40. Interestingly these same two layers - L2 and L5 - showed preferential enrichment of genes implicated in common variation for SCZD 41, and to a lesser extent, BPD 42. These results were in contrast to differential expression analyses from postmortem studies of brain tissue from patients with SCZD compared to neurotypical controls 16,38, which showed increased expression of upper layer genes and decreased expression of deep layer and WM genes. Further, we show that SCZD heritability is enriched in L2, a finding that implicates intracortical information processing as a potential mechanism for genetic risk.
Second, we overlaid recent large-scale snRNA-seq data from several cohorts to both confirm our layer-enriched expression signatures and further annotate gene expression-driven clusters to individual cortical layers. The shift from homogenate sequencing studies of brain tissue 38,43,44 to large-scale snRNA-seq has already begun, with increasing sample sizes and numbers of nuclei 6,10. Our “spatial registration” strategy, which uses individual gene-level statistics from both layer-specific versus cell type-specific expression profiles from hundreds or thousands of genes is likely more powerful than table-based enrichment analyses using small subsets of previously-defined marker genes. Spatial registration of multiple independent datasets with our Visium data showed that layer-enriched patterns of expression can be extracted from snRNA-seq data, as subtypes of excitatory neuronal cells, and to a lesser extent, inhibitory neuronal cells, could be classified by their preferential laminar enrichment. While this strategy does not aid in constructing cell clusters in snRNA-seq data, it is a powerful tool to better annotate and interpret data-driven clusters, and add spatial context to cell type-specific gene expression in the brain.
Third, in contrast to manually annotating laminar clusters based on cytoarchitecture, which is very labor-intensive, we evaluated the performance of alternative, data-driven approaches to cluster spots based on spatially variable genes. Using these data-driven approaches, we identified variable spatial expression related to relatively rare inhibitory neuron subpopulations, brain vasculature, and immune function: 1) NPY, which encodes a neuropeptide highly expressed in a subpopulation of inhibitory interneurons, 2) HBB, which encodes a subunit of hemoglobin found in red blood cells, and 3) IGKC, which encodes the constant region of immunoglobulin light chains found in antibodies (Figure 7 D). The layer-enriched genes identified here can be used to aid data-driven clustering in human cortex, and indeed, performed better than previously-defined markers (Figure 7 E, F). Since these data-driven approaches can identify previously unknown cellular organizations their application will be critical for analyzing other human tissues and brain structures where morphological patterning is not as well-defined or as well-characterized compared to the cerebral cortex.
While LCM and other microdissection techniques have been used to generate laminar-specific gene expression profiles in human cortex 20–22, morphological boundaries cannot be definitively defined, which hinders the ability to examine spatial relationships between cell populations or to define gradients of gene expression across structures. For example, several layer-enriched genes identified by Visium show striking gradients of gene expression, such as HPCAL1 which is highly expressed in L2 but steadily decreases in expression through L4, L5, and L6. Conversely, KRT17 is enriched in L6 and progressively decreases in expression through L5, L4, L3, and L2. While the current resolution of a spatially-barcoded spot in the Visium platform is 55μm, we found that 15.0% of spots contained a single cell body, highlighting an additional available level of interrogation for downstream analysis. Ongoing advances will likely lead to improved spatial resolution, as custom platforms now reach subcellular resolutions of 10μm and 2μm 24,45. Although each Visium spot may contain a mixture of cell types, incorporating cell body segmentations from histological data can aid in refining interpretations and identifying spots with cell types of interest. Advances in Visium technology that allow for immunofluorescence (IF) protein detection can also be used to integrate morphological features of different cell types with gene expression data. Finally, Visium afforded several experimental advantages compared to fluorescence microscopy-based spatial transcriptomics approaches 46,47 including, 1) coverage across a large area of brain tissue, 2) unbiased, transcriptome-wide analysis of gene expression (i.e. no requirement to select gene targets of interest), and 3) no confounds from lipofuscin autofluorescence. These advantages provide flexibility to analyze spatial gene expression from numerous angles (i.e. supervised clustering, unsupervised clustering, neuropil only) within a single experiment, which would be nearly impossible to accomplish with more labor intensive approaches. However, it should be noted that the substantial gyrations inherent to the human cortex make consistency of dissections across individual donors difficult, which will remain a challenge for larger scale experiments aiming to generate equivalent clusters across tissue sections or perform clustering in a 3D coordinate system. The generation of additional human brain datasets and more finely-annotated human brain atlases that include microcircuitry (i.e. cortical layers as opposed to cortical regions) will be critical for developing new methods to register snRNA-seq, spatial transcriptomics, and accompanying histological data to a common coordinate framework.
While the laminar structure of the neocortex is largely preserved across mammalian species, several recent studies have underscored key similarities and differences in laminar gene expression between humans, primates, and rodents 5,22,32. Given the functional importance associated with laminar origin, recent snRNA-seq studies in postmortem human cortex have attempted to annotate molecularly-defined cell type clusters to the layer from which they originated 6,10. However, these laminar annotations are largely derived from curated gene sets that come from rodents and non-human primates. While we validated laminar-enrichment of some canonical layer-specific genes that were previously identified in the rodent and human cortex (Figure 3 and Figure S9), some classical markers, such as BCL11B (L5), showed weak laminar patterning in adult DLPFC. Likewise, many genes showed no laminar patterning (Extended Data 5). These findings may represent true differences between brain regions, species, and developmental time points, and reinforce previous studies that urge caution in translating rodent and primate studies of molecularly and spatially-defined cell types into the human brain. Indeed, our unbiased genome-wide approach identified a number of previously underappreciated layer-enriched genes in human DLPFC, including HPCAL1 (L2) , KRT17 (L6), and TRABD2A (L5), that may represent markers with higher fidelity for laminar annotation of snRNA-seq clusters in human brain (Figure 4). We also confirmed laminar enrichment of several genes identified as cell type markers in specific cortical layers by Hodge et al. (LAMP5, AQP4, FREM3), suggesting conservation between laminar markers in middle temporal gyrus and DLPFC 5.
In contrast to the snRNA-seq approaches that encompass the vast majority of gene expression profiling studies in frozen postmortem human brain tissue, Visium is not limited to analysis of nuclear information. Indeed, on-slide cDNA synthesis methods capture information from both cytosol and neuronal processes, including dendrites and axons (neuropil). Using cell segmentation of high-resolution histology images acquired before on-slide cDNA synthesis we determined that each spot contained an average of 3.3 cells with 9.7% of spots containing no cell bodies. Confirming our hypothesis that spots containing no cell bodies would be enriched for transcripts highly expressed in neuronal processes and synapses, we identified significant enrichment of genes preferentially expressed in synaptic terminals. We further found enriched mitochondrial gene expression in sparser layers like L1 (Supplementary Figure 12), which likely relates to our finding that L1 was most enriched for ‘neuropil spots’, and a higher energetic supply to axons and dendrites would be expected 48,49. Moreover, converging evidence suggests that impairments in the formation or maintenance of synapses in key circuits underlies risk for neuropsychiatric and neurodevelopmental disorders, including SCZD and ASD 6,14,15. Supporting this notion, genes associated with increased risk for SCZD that were identified by GWAS were found to be preferentially enriched for synaptic neuropil transcripts 12. Better understanding regulation of synaptically localized transcripts is important because the synaptic proteins they encode control neuronal homeostasis and drive synaptic plasticity 50, and directly studying neuropil-enriched transcripts in human brain may provide novel insights about expression of locally translated synaptic genes.
Beyond these specific insights, we created several resources for the broader scientific community to continue to interrogate these large datasets for their own biological questions and to extend current neuroscience and spatial transcriptomics research. All raw and processed data and code presented here are freely available through our web application “spatialLIBD” (http://spatial.libd.org/spatialLIBD). Using this application, researchers can visualize the spot-level Visium data, manually annotate spots to layers, visualize the layer-level results, assess the enrichment of gene sets among layer-enriched genes, and perform spatial registration. These, and additional features, are described in detail at http://research.libd.org/spatialLIBD/.
Methods
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact: Andrew E Jaffe (andrew.jaffe@libd.org).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Post-mortem human tissue samples
Post-mortem human brain tissue from three donors of European ancestry (Table S1) was obtained by autopsy primarily from the Offices of the Chief Medical Examiner of the District of Columbia, and of the Commonwealth of Virginia, Northern District, all with informed consent from the legal next of kin (protocol 90-M-0142 approved by the NIMH/NIH Institutional Review Board). Clinical characterization, diagnoses, and macro- and microscopic neuropathological examinations were performed on all samples using a standardized paradigm, and subjects with evidence of macro- or microscopic neuropathology were excluded. Details of tissue acquisition, handling, processing, dissection, clinical characterization, diagnoses, neuropathological examinations, RNA extraction and quality control measures have been described previously 51. Briefly, dorsolateral prefrontal cortex (DLPFC) was microdissected and embedded in OCT in a 10mm x 10mm cryomold. Each sample was dissected in a plane perpendicular to the pial surface in area 46 of the cortex to capture from the pial surface to the gray-white matter junction and spanned L1-6 and the WM.
METHOD DETAILS
Tissue processing and Visium data generation
Frozen samples were embedded in OCT (TissueTek Sakura) and cryosectioned at −10C (Thermo Cryostar). Sections were placed on chilled Visium Tissue Optimization Slides (3000394, 10X Genomics) and Visium Spatial Gene Expression Slides (2000233, 10X Genomics), and adhered by warming the back of the slide. Tissue sections were then fixed in chilled methanol and stained according to the Visium Spatial Gene Expression User Guide (CG000239 Rev A, 10X Genomics) or Visium Spatial Tissue Optimization User Guide (CG000238 Rev A, 10X Genomics). For gene expression samples, tissue was permeabilized for 18 minutes (Supplementary Figure 1), which was selected as the optimal time based on tissue optimization time course experiments. Brightfield histology images were taken using a 10X objective (Plan APO) on a Nikon Eclipse Ti2-E (27755 x 50783 pixels for TO, 13332 x 13332 pixels for GEX). Raw images were stitched together using NIS-Elements AR 5.11.00 (Nikon) and exported as .tiff files with low and high resolution settings. For tissue optimization experiments, fluorescent images were taken with a TRITC filter (ex/em brand) using a 10X objective and 400 ms exposure time.
Libraries were prepared according to the Visium Spatial Gene Expression User Guide (CG000239, https://assets.ctfassets.net/an68im79xiti/3pyXucRaiKWcscXy3cmRHL/a1ba41c77cbf60366202805ead8f64d7/CG000239_VisiumSpatialGeneExpression_UserGuide_Rev_A.pdf). Libraries were loaded at 300 pM and sequenced on a NovaSeq 6000 System (Illumina) using a NovaSeq S4 Reagent Kit (200 cycles, 20027466, Illumina), at a sequencing depth of approximately 250-400M read-pairs per sample. Sequencing was performed using the following read protocol: read 1, 28 cycles; i7 index read, 10 cycles; i5 index read, 10 cycles; read 2, 91 cycles.
Visium raw data processing
Raw FASTQ files and histology images were processed by sample with the Space Ranger software version 1.0.0, which uses STAR v.2.5.1b 52 for genome alignment, against the Cell Ranger hg38 reference genome “refdata-cellranger-GRCh38-3.0.0”, available at: http://cf.10xgenomics.com/supp/cell-exp/refdata-cellranger-GRCh38-3.0.0.tar.gz. Quality control metrics returned by this software are available in Table S1.
Histology image processing and segmentation
Histology images were processed and nuclei were segmented using the “Color-Based Segmentation using K-Means Clustering” in MATLAB vR2019a. The MATLAB function rgb2lab is used to convert the image from RGB color space to CIELAB color space also called L*a*b color space (L - Luminosity layer measures lightness from black to white, a - chromaticity-layer measures color along red-green axis, b - chromaticity-layer measures color along blue-yellow axis). The CIELAB color space quantifies the visual differences caused by the different colors in the image. The a*b color space is extracted from the L*a*b converted image and is given to the K-means clustering function imsegkmeans along with the number of colors the user visually identifies in the image. The imsegkmeans outputs a binary mask for each color it identifies. Since the nuclei in the histology images have a bright color that can be easily differentiated from the background, a binary mask generated for the nuclei color is used as the nuclei segmentation.
The segmented binary mask was used to estimate the number of nuclei in each spot. For each histology image, there is a JSON file describing some properties of the image, including the spot diameter in pixels at the full-resolution image. Additionally, for each image there is a text file in tabular format that includes one row for each spot with an identification barcode, a row, a column and pixel coordinates for the center of the spot on the full-resolution image. Using this information, the following protocol was implemented. For each spot, all pixels of the binary mask were set to zero except those within the spot radius of the center of that spot. The resulting binary mask was then labeled with a unique integer for each unique contiguous cluster of pixels. The maximum of this labeled mask was stored as an estimate of the number of nuclei within that spot.
Spot-level data processing
The raw Visium files for each sample (Method Details: Visium raw data processing) were read into R into a custom structure using the SummarizedExperiment R package 53 to keep them paired with the low resolution histology images for visualization purposes. They were then combined into a single SingleCellExperiment 54 v1.8.0 object (sce) to allow us to perform analyses using the gene expression data from all samples. We added information, including the number of estimated cell counts (Method Details: Histology image processing and segmentation), the sum of UMIs per spot, number of expressed genes per spot, and graph-based clustering results (computed by sample) provided by 10x Genomics Space Ranger software to the sce object. We evaluated the per-spot quality metrics using the function perCellQCMetrics from the scran v1.14.3 R Bioconductor package 55 and did not drop any spots given the spatial pattern they presented. We used the scran 55 functions quickCluster, blocking by the six pairs of spatially adjacent replicates, computeSumFactors, and scater’s 56 logNormCounts to compute the log-normalized gene expression counts at the spot-level. By modeling the gene mean expression and variance with the modelGeneVar scran 55 function, blocking again by the six pairs of spatially adjacent replicates, followed by getTopHVGs we identified the top 10% highly variable genes (HVGs): 1,942 genes. Using this subset of HVGs, we computed principal components (PCs) with scater’s v1.14.3 56 runPCA to produce 50 components. Using these 50 top PCs, we computed tSNE 57 and UMAP 58 dimension reduction methods using runTSNE (perplexity 5, 20, 50, 80) and runUMAP (15 neighbors) from scater 56. With the top 50 PCs, we performed graph-based clustering across all samples using 50 nearest neighbors using buildSNNGraph from scran 55 and the Walktrap method from implemented by igraph 59 v1.2.4.1 resulting in 28 clusters (snn_k50_k4 through snn_k50_k28). We further cut the graph to produce clusters from 4 to the 28 in increments of 1. We used spatialLIBD 60 v0.99.0 to assign the graph-based clusters from 10x Genomic to the closest anatomical layers for each sample (Maynard and Martinowich).
All this information was combined and displayed through a shiny 61 v1.4.0 web application at http://spatial.libd.org/spatialLIBD in such a way that we, and now the scientific community, can visualize the expression of a given gene, or a given set of clustering results, across all samples or each sample individually. For any chosen sample, spatialLIBD allows users to view gene expression and selected cluster results both in the context of spatial histology and given dimension reduction results (PCA, tSNE, UMAP) using plotly 62. Using this web application to visualize cytoarchitecture 63,64 combined with a dimensionality reduction method, specifically t-Distributed Stochastic Neighbor Embedding (t-SNE) 57 as well as the expression patterns of MBP and PCP4, known WM and L5 marker genes, a single experimenter manually assigned each spot to a cortical layer for each sample for all but 352 out of the 47,681 spots across all samples. These 352 spots were located on small fragments of damaged tissue disconnected from the main tissue section. We added these supervised layer annotations to our sce object and the final version is available for download through the fetch_data function in spatialLIBD 60.
Layer-level data processing
For the subject with brain ID Br5595, which lacked L1 and clear cytoarchitecture for L2 and L3, we re-labelled all “L2/L3” ambiguous spots as L3 and dropped the 352 un-assigned spots. We then pseudo-bulked 65–67 the spots into layer-level data by summing the raw gene expression counts across all spots in a given sample and in a given layer, and repeated this procedure for each gene, sample and layer combination (Figure 2 A). This resulted in 47,329 genes quantified across 76 layer-sample combinations (7 * 12 = 84, because not all layers were clearly observed in each sample as Br5595 had no distinct L1 or L2 across all four spatial replicates, Supplementary Figure 5). This resulted in another SingleCellExperiment 54 object called sce_layer. We used librarySizeFactors and logNormCounts from scater 56 to compute layer-level log normalized gene expression values. We dropped all mitochondrial genes and retained genes that were expressed in at least 5% (4 / 76 layer-sample combinations) and had an average counts greater than 0.5 as computed by calculateAverage from scater 56, resulting in a final set of 22,331 genes. We identified 1,280 top HVGs at the layer-level and computed 20 PCs (Figure 2 B) which we then used in the tSNE (perplexity = 5, 15 and 20) and UMAP (15 neighbors) computations similar to Method Details: Spot-level data processing. We then clustered the layer-level data using several graph-based approaches as well as using k-means. This is the sce_layer data that is available for download through the fetch_data function in spatialLIBD 60.
Neuropil enrichment analyses
We performed differential expression analysis at the spot-level in our Visium data, comparing the 4,855 spots with 0 cell bodies to the other 42,474 spots containing at least 1 cell body, adjusting for fixed effects of layer and spatial replicate. We downloaded differential expression statistics from RNA-seq of vGLUT1+ enriched synaptosomes in mouse brain from Hafner et al. 30, and lined up these data at the gene-level using homologous entrez IDs between mouse and human (via http://www.informatics.jax.org/downloads/reports/HMD_HumanPhenotype.rpt). We compared the effects of spots containing 0 cells in our data to vGLUT1+ enriched cells from Hafner et al, both across the full homologous transcript and then within genes significant in the Hafner dataset at FDR < 0.05.
Layer-level gene modeling
Using the layer-level data we fit three types of models (Figure 2 C, Extended Data 4):
ANOVA: For this model we tested for each gene whether the log normalized gene expression counts are variable between the layers by computing F-statistics. We used lmFit and eBayes from limma 68 v3.42.0 after blocking by the six pairs of spatially adjacent replicates and taking this correlation into account as computed by duplicateCorrelation.
Enrichment: Using the same functions and taking into account the same correlation structure, we computed t-statistics comparing one layer against the other six using the layer-level data. This resulted in seven sets of t-statistics (one per layer) with double-sided P-values. We focused on genes with positive t-statistics (expressed higher in one layer against the others) as these are enriched genes instead of depleted genes.
Pairwise: Using the same functions and taking into account the same correlation structure in addition to using contrasts.fit from limma 68, we computed t-statistics for each pair of layers resulting in t-statistics with double-sided P-values.
The modeling results are available for download through the fetch_data function in spatialLIBD 60 as the modeling_results object as well as in Table S4.
Known marker genes optimal modeling
Using two lists of known layer marker genes derived from previous mouse or human studies 32,31, we identified 29 different unique optimal models for these genes. For example, L1 + L2 versus the other layers. Using the same modeling framework (Method Details: Layer-level gene modeling) we computed t-statistics for all genes at the layer-level data for each of these 29 unique models. For each of the 29 unique models, we then retained information about the statistics for the known marker genes matching the model as well as the top ranked (with a positive t-statistic gene, Table S5).
RNAscope single molecule fluorescent in situ hybridization (smFISH)
Fresh frozen DLPFC from the same three neurotypical control samples used for Visium were sectioned at 10μm and stored at −80°C. In situ hybridization assays were performed with RNAscope technology utilizing the RNAscope Fluorescent Multiplex Kit V2 and 4-plex Ancillary Kit (Cat # 323100, 323120 ACD, Hayward, California) according to the manufacturer’s instructions. Briefly, tissue sections (2 to 4 per individual) were fixed with a 10% neutral buffered formalin solution (Cat # HT501128 Sigma-Aldrich, St. Louis, Missouri) for 30 minutes at room temperature (RT), series dehydrated in ethanol, pretreated with hydrogen peroxide for 10 minutes at RT, and treated with protease IV for 30 minutes. Sections were incubated with 4 different probe combinations: A) L1 and L5: AQP4, RELN, TRABD2A, BCL11B (Cat 482441-C4, 413051-C2, 532881, 425561-C3, ACD, Hayward, California); B) L3 and L6: CARTPT, FREM3, NR4A2, (506591, 829021-C4, 582621-C3); C) L2/3 and WM: LAMP5, HPCAL1, NDRG1, MBP (487691-C2, 846051-C3, 481471, 411051-C4); D) Visium-identified genes: AQP4, TRABD2A, KRT17 (463661-C2), HPCAL1. Following probe labeling, sections were stored overnight in 4x SSC (saline-sodium citrate) buffer. After amplification steps (AMP1-3), probes were fluorescently labeled with Opal Dyes (Perkin Elmer, Waltham, MA; 1:500) and stained with DAPI (4′,6-diamidino-2-phenylindole) to label the nucleus. Lambda stacks were acquired in z-series using a Zeiss LSM780 confocal microscope equipped with 20x x 1.4 NA and 63x x 1.4NA objectives, a GaAsP spectral detector, and 405, 488, 555, and 647 lasers. All lambda stacks were acquired with the same imaging settings and laser power intensities. For each subject, a cortical strip was tile imaged at 20x to capture L1 to WM. Following image acquisition, lambda stacks in z-series were linearly unmixed using Zen Black (weighted; no autoscale) using reference emission spectral profiles previously created in Zen for dotdotdot (git has version 4e1350b) 17, stitched, maximum intensity projected, and saved as Carl Zeiss Image “.czi” files.
snRNA-seq spatial registration
For each snRNA-seq dataset, we utilized publicly-available processed unique molecular index (UMI) count data for each gene and nucleus, and provided annotations of cell clusters/subtypes. Within each dataset, we performed ‘pseudo-bulking’ 65–67 of nuclei-level UMIs into cell type-specific log-transformed normalized counts for each unique subject. We then computed cell type ‘enrichment’ statistics for each gene and dataset-provided cell type within their pseudo-bulk profiles by performing linear mixed effects modeling comparing each cell type to all other cell types, treating donor as a random intercept 69, and adjusting for study-specific covariates described below. This strategy was analogous to the layer ‘enrichment’ statistics described for our Visium data (Method Details: Layer-level gene modeling). We then computed Pearson correlation coefficients between our layer-enriched ‘enrichment’ statistics and snRNA-seq cell type-specific “enrichment” statistics among the 700 most layer-enriched genes (combining the 100 most significant genes for each of the six layers and WM in the Visium data) expressed in each snRNA-seq dataset. In addition to our DLPFC snRNA-seq dataset (Method Details: DLPFC snRNA-seq data generation), we utilized these publicly-available datasets:
Hodge et al. 5: Processed data was obtained from https://portal.brain-map.org/atlases-and-data/rnaseq. We retained total gene counts (exons plus introns) from 49,494 nuclei corresponding to postmortem human brain tissue across both neurons and non-neurons across 50,281 genes across 6 layers and 2 cell types. These data were reduced to 52 pseudo-bulk profiles, for all unique donor-layer-type combinations. We calculated ‘enrichment’ statistics for each of the six layers in their dataset, adjusting for the fixed effect of cell type (neuronal or glial) with a random intercept of donor.
Velmeshev et al. 6: Processed data was obtained from https://cells.ucsc.edu/ (under the “Autism” study data download). We used the post-QC UMI counts from all 104,559 nuclei across 65,217 genes across 41 unique donor-region pairs (for 31 unique donors and 2 brain regions) and 17 cell types. These data were reduced to 691 pseudo-bulked profiles, for all unique donor-region-type combinations. We calculated ‘enrichment’ statistics for each of the 17 cell types in this dataset, adjusting for the fixed effect of brain region, age, sex, and ASD diagnosis, with a random intercept of donor.
Mathys et al. 10: Processed data were obtained from Synapse at accession: syn18485175. We used the post-QC UMI counts from all 70,634 nuclei across 17,926 genes across 48 unique donors and 44 cell subtypes (across 8 broad cell classes). These data were reduced to 1877 pseudo-bulked profiles, for all unique donor-subtype combinations. We calculated ‘enrichment’ statistics for each of the 44 cell subtypes in this dataset, adjusting for the fixed effect of age, sex, race, and Alzheimer’s disease diagnosis, with a random intercept of donor.
We also downloaded and reprocessed RNA-seq data from He et al. 22 from SRA accession SRP199498 using our previously-described RNA-seq processing pipeline 38. These data consisted of homogenate RNA-seq data from 18 serial sections across 4 unique donors.
DLPFC snRNA-seq data generation
We performed single-nucleus RNA-seq (snRNA-seq) on DLPFC tissue from two neurotypical donors using 10x Genomics Chromium Single Cell Gene Expression V3 technology. Nuclei were isolated using a “Frankenstein” nuclei isolation protocol developed by Martelotto et al. for frozen tissues 8,70–73. Briefly, ~40mg of frozen DLPFC tissue was homogenized in chilled Nuclei EZ Lysis Buffer (MilliporeSigma) using a glass dounce with ~15 strokes per pestle. Homogenate was filtered using a 70μm-strainer mesh and centrifuged at 500 x g for 5 minutes at 4°C in a benchtop centrifuge. Nuclei were resuspended in the EZ lysis buffer, centrifuged again, and equilibrated to nuclei wash/resuspension buffer (1x PBS, 1% BSA, 0.2U/μL RNase Inhibitor). Nuclei were washed and centrifuged in this nuclei wash/resuspension buffer three times, before labeling with DAPI (10μg/mL). The sample was then filtered through a 35μm-cell strainer and sorted on a BD FACS Aria II Flow Cytometer (Becton Dickinson) at the Johns Hopkins University Sidney Kimmel Comprehensive Cancer Center (SKCCC) Flow Cytometry Core into 10X Genomics reverse transcription reagents. Gating criteria hierarchically selected for whole, singlet nuclei (by forward/side scatter), then for G0/G1 nuclei (by DAPI fluorescence). A “null” sort into wash buffer was additionally performed from the same preparation for quantification of nuclei concentration and to ensure nuclei input was free of debris. Approximately 8,500 single nuclei were sorted directly into 25.1μL of reverse transcription reagents from the 10x Genomics Single Cell 3’ Reagents kit (without enzyme). Libraries were prepared according to manufacturer’s instructions (10x Genomics) and sequenced on the Next-seq (Illumina) at the Johns Hopkins University Transcriptomics and Deep Sequencing Core.
We processed the sequencing data with the 10x Genomics’ Cell Ranger pipeline v3.0.2, aligning to the human reference genome GRCh38, with a reconfigured GTF such that intronic alignments were additionally counted given the nuclear context, to generate UMI/feature-barcode matrices (https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/references). We started with raw feature-barcode matrices for analysis with the Bioconductor suite of R packages for single-cell RNA-seq analysis 74. For quality control and cell calling, we first used a Monte Carlo simulation-based approach to assess and rule out empty droplets or those with random ambient transcriptional noise, such as from debris 75,76. This was then followed by mitochondrial rate adaptive thresholding, which, though expected to be near-zero in this nuclear context, we allowed for a 3x median absolute deviation (MAD) threshold. This allowed for flexibility in output/purity of FACS workflows. This QC pipeline yielded 5,399 high-quality nuclei from the DLPFC from two donors, which were then rescaled across all nuclear libraries, then log-transformed for determination of highly-variable genes, again with scran’s modelGeneVar, this time taking all genes (9,313) with a greater variance than the fitted trend. Principal components analysis (PCA) was then performed on these selected genes to reduce the high dimensionality of nuclear transcriptomic data. The optimal PC space was defined with iterative graph-based clustering to determine the d PCs where resulting n clusters stabilize, with the constraint that n clusters </= (d + 1) PCs 55, resulting in a chosen d=81 PCs. In this PCA-reduced space, graph-based clustering was performed (specifically, k-nearest neighbors with k=20 neighbors and the Walktrap method from R package igraph 59 for community detection) to identify 31 preliminary clusters. We then took all feature counts for these assignments and pseudo-bulked counts across 31 preliminary nuclear clusters, rescaling for combined library size and log-transforming normalized counts, then performed hierarchical clustering to identify preliminary cluster relationships and merging with the cutreeDynamic function of R package dynamicTreeCut 77. These broader clusters were finally annotated with well-established cell type markers for nuclear type identity 10. We also used Bioconductor package scater’s 56 implementation of non-linear dimensionality reduction techniques, t-SNE and UMAP, with default parameters and within the aforementioned optimal PC space, simply for visualization of the high-dimensional structure in the data, which complemented the clustering results (Supplementary Figure 9).
Clinical gene set enrichment analyses
We assessed the laminar enrichment of a series of predefined clinical gene sets for various neuropsychiatric and neurodevelopmental disorders. These gene sets consisted of data from:
Birnbaum et al. 34: 10 gene sets across SCZ, ASD, neurodevelopmental disorders, intellectual disability, bipolar disorder, and neurodegenerative disorders.
SFARI 35: 3 gene sets consisting of all human genes, high confidence genes, and syndromic genes.
Satterstrom et al. 36: 6 gene sets based on exome sequencing studies.
psychENCODE 16: 6 gene sets based on DE analyses of patients with ASD, SCZD, and bipolar disorder (BPD), stratified by directionality in cases, and 8 gene sets based on TWAS 78 (ASD, SCZD, BPD, SCZD-BPD, stratified by directionality), each at FDR < 0.05.
BrainSeq 38: 2 gene sets based on DE analyses of patients with SCZD versus controls (at FDR < 0.05), stratified by directionality.
Down syndrome 79: 2 gene sets based on DE analyses of patients with Down syndrome versus controls (at FDR < 0.05), stratified by directionality.
We collected all reported genes in each gene set, and retained the majority that were expressed in our Visium dataset - these gene set sizes are provided in Table S6. Enrichment for each gene set for each layer was based on a gene being significantly more highly expressed in one layer versus all other layers (at FDR < 0.1, as this cutoff for defining genes with increased expression was approximately as conservative as using an FDR < 0.05 cutoff with either direction). This calculation was performed using Fisher’s exact test, which returned an odds ratio and p-value for each gene set and layer (Table S6). Pooling all p-values resulted in FDR control of 5% for marginal p-values < 0.01.
We additionally performed MAGMA 37 v1.07b using the subset of 24,347 Ensembl gene IDs expressed in our pseudo-bulked Visium data that were present in the provided GR37/hg19 annotation across multiple GWASs for SCZD 41, BPD 42, MDD 80 and ASD 40. We used window sizes of +35kb and −10kb around each gene to aggregate SNPs to genes using the 1000 Genomes EUR reference profile using SNP-wise stats. We then performed gene set testing using MAGMA for seven gene sets (related to the six layers and WM) for genes with positive (+) enrichment statistics at FDR < 0.1. Additionally, we performed linkage disequilibrium score regression (LDSC v1.0.1) and partitioned heritability analysis 39,81 using 30 GWAS traits collected by Rizzardi et al 82. Genomic regions were created from the same enriched and FDR < 0.1 genes as above, here with +10kb and −5kb windows, and lifted over to hg19 coordinates.
Data-driven layer-enriched clustering analysis
For the data-driven layer-enriched clustering, we first performed feature selection in two ways to identify laminar and non-laminar patterns in our data. The first method for feature selection used was SpatialDE 83 to identify genes exhibiting spatially variable expression patterns (SVGs). SpatialDE was run in Python version 3.8.0. We ran SpatialDE individually on each of the 12 samples, which returned a set of statistically significant (false discovery rate < 0.05) SVGs per sample. We included an additional filtering step to remove lowly-expressed genes (less than 1,000 total UMIs summed across spots per sample), as well as removing mitochondrial genes. This left between 521 and 2,217 genes per sample (Table S9). In total, there were 2,775 unique genes across samples; for comparison, we also ran clustering methods using this pooled list (Table S9 and Table S10). The second feature selection method used the scran R Bioconductor package 55 to identify (non-spatial) highly variable genes (HVGs) across all samples combined, which identified 1,942 HVGs. Due to slow runtime, it was not possible to run SpatialDE on pooled spots from all samples combined.
In the ‘unsupervised’ approach to define sub-groups of spots with similar expression profiles in a completely data-driven manner, we considered the possible combinations of (i) two types of methods for dimensionality reduction (top 50 principal components (PCs) with the BiocSingular v1.2.2 Bioconductor package 84, and top 10 UMAP 58 components with the uwot v0.1.5 R package 85 calculated on the top 50 PCs), (ii) the gene sets defined after applying feature selection (SpatialDE genes for each sample, pooled SpatialDE gene lists across all 12 samples, and HVGs), and (iii) including (or not) the two spatial coordinates (x and y coordinates) of each spot as additional features for clustering. For the clustering algorithm, we constructed a shared nearest neighbor graph with the scran Bioconductor package and then applied the Walktrap method from the igraph R package 59 to obtain predicted cluster labels. We set all clustering implementations to return eight final clusters (i.e. one more than the six DLPFC layers plus white matter), which gave slightly improved clustering performance (compared to seven clusters) due to additional splitting of the white matter cluster and some outlier spots. Table S10 contains an overview of all combinations that were tried.
For comparison, we also implemented a ‘semi-supervised’ approach, where we used the layer-enriched gene sets identified using the DE “enrichment” models described previously (Extended Data 4), and a ‘markers’ approach using known marker genes from Zeng et al. 32 (Table S10).
To evaluate the performance of the clustering approaches, we used the adjusted Rand index (ARI), which measures the similarity between the predicted cluster labels and “gold standard” cluster labels. The manually-annotated layers were used as the “gold standard” (Figure 7 and Supplementary Figure 11). Higher ARI values correspond to better clustering performance, with a maximum value of 1 indicating perfect clustering agreement. To evaluate the improvement in ARI when including spatial coordinates within the clustering methods, we fit a linear model on the ARI scores, comparing these methods against methods without spatial coordinates across all methods and samples, and recorded the p-value.
https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/pdf/SFileXX_clustering_compressed.pdf contains visualizations of clustering results for all samples and clustering methods (Table S10) (similar to Extended Data 9 and Extended Data 10, which display results for sample 151673 only). A description of the clustering methods is provided in Table S10.
STATISTICS AND REPRODUCIBILITY
No statistical methods were used to pre-determine sample sizes, but our sample sizes are greater than previous publications with the earlier Spatial Transcriptomics technology that preceded Visium 23,28. Table S1 contains the demographic information (two males one female neurotypical controls with age at time of death from 30 to 46 years old) for the three donor brains we used to generate the six pairs of spatial replicates (two per donor) for a total of twelve images. All boxplots shown in main and supplementary figures displayed the median as the center, interquartile range (IQR, 25th and 75th percentiles) as the box ranges, and 1.5 times the IQR as the whiskers. All reported p-values were two-sized and all p-values were adjusted for multiple testing with Benjamini-Hochberg correction unless otherwise indicated. Distributions of the residuals of our many linear models were assumed to be normally distributed across all genes and models but this was not formally tested. All points were used in all analyses, e.g. outliers were not removed. We used the six pairs of spatial replicates as a blocking factor in our analyses. Data collection and analysis were not performed blind to the conditions of the experiments. See the Life Sciences Reporting Summary for further details.
Quantification and statistical analyses
The different subsections of the “Method Details” further specify the statistical models and tests used as well as the versions of the specific software used. Overall, statistical tests were performed using R versions 3.6.1 and 3.6.2 with Bioconductor version 3.10 with detailed R session information provided in the code GitHub repositories listed under “Code Availability.” The threshold and method used for statistical significance is listed in the main text along the description of the results. Plots in R were created in either base R or with the ggplot2 R package 86.
CODE AVAILABILITY
The code for this project is publicly available through GitHub and archived through Zenodo. Specifically, the code is available through GitHub at https://github.com/LieberInstitute/HumanPilot 87 and https://github.com/LieberInstitute/spatialLIBD 60, both of which are described in their README.md files.
ADDITIONAL RESOURCES
In order to visualize the spot-level Visium data we generated, we created a shiny 61 interactive browser available at http://spatial.libd.org/spatialLIBD that is powered by the Bioconductor package spatialLIBD 60.
Extended Data
Supplementary Material
Acknowledgements
The authors would like to express their gratitude to our colleagues whose efforts have led to the donation of postmortem tissue to advance these studies, including at the Office of the Chief Medical Examiner of the State of Maryland, Baltimore Maryland and the Office of the Chief Medical Examiner of Kalamazoo County Michigan. We also would like to acknowledge the contributions of Llewellyn B. Bigelow, MD and Amy Deep-Soboslay for their diagnostic expertise, and Daniel R. Weinberger for providing constructive commentary and editing of the manuscript. Finally, we are indebted to the generosity of the families of the decedents, who donated the brain tissue used in these studies. We would also like to thank the Accelerating Medicines Partnership - Alzheimer’s Disease (AMP-AD) Target Discovery and Preclinical Validation program and the ROSMAP study. We would like to thank William S. Ulrich for assistance with http://spatial.libd.org/spatialDE and the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health for hosting mirrors of our web application. We thank the Johns Hopkins University Sidney Kimmel Comprehensive Cancer Center (SKCCC) Flow Cytometry Core and the Johns Hopkins University Transcriptomics and Deep Sequencing Core for supporting snRNA-seq experiments.
Funding
This project was supported by the Lieber Institute for Brain Development. K.R.M., L.C-T., K.M., and A.E.J. were partially supported by the National Institute of Mental Health (U01MH122849). S.C.H. and L.M.W. were supported by the National Cancer Institute (R01CA237170). S.C.H. was also supported by the CZF2019-002443 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation, the National Human Genome Research Institute (R00HG009007).
Footnotes
Accession codes
The raw data is publicly available from the Globus endpoint “jhpce#HumanPilot10x” that is also listed at http://research.libd.org/globus/. See “Data Availability” for more information.
Competing Interests Statement
C.U., S.R.W., J.C., Y.Y., and N.R. are employees of 10x Genomics. All other authors have no conflicts of interest to declare.
Data Availability Statement
Processed data is publicly available from the Bioconductor package spatialLIBD 60. The raw data is publicly available from the Globus endpoint as described under “Accession codes”. The raw data provided through Globus includes all the FASTQ files and raw image files.
External data used in this project is detailed under “Methods: snRNA-seq spatial registration” as well as “Methods: Clinical gene set enrichment analyses”.
References
- 1.DeFelipe J & Fariñas I The pyramidal neuron of the cerebral cortex: morphological and chemical characteristics of the synaptic inputs. Prog. Neurobiol 39, 563–607 (1992). [DOI] [PubMed] [Google Scholar]
- 2.Harris KD & Shepherd GMG The neocortical circuit: themes and variations. Nat. Neurosci 18, 170–181 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Narayanan RT, Udvary D & Oberlaender M Cell Type-Specific Structural Organization of the Six Layers in Rat Barrel Cortex. Front. Neuroanat 11, 91 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Radnikow G & Feldmeyer D Layer- and Cell Type-Specific Modulation of Excitatory Neuronal Activity in the Neocortex. Front. Neuroanat 12, 1 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hodge RD et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Velmeshev D et al. Single-cell genomics identifies cell type-specific molecular changes in autism. Science 364, 685–689 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Darmanis S et al. A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci USA 112, 7285–7290 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lake BB et al. Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352, 1586–1590 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Lake BB et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol 36, 70–80 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mathys H et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 570, 332–337 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nowakowski TJ et al. Spatiotemporal gene expression trajectories reveal developmental hierarchies of the human cortex. Science 358, 1318–1323 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Skene NG et al. Genetic identification of brain cell types underlying schizophrenia. Nat. Genet 50, 825–833 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bakken TE et al. Single-nucleus and single-cell transcriptomes compared in matched cortical cell types. PLoS ONE 13, e0209648 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sweet RA, Fish KN & Lewis DA Mapping Synaptic Pathology within Cerebral Cortical Circuits in Subjects with Schizophrenia. Front. Hum. Neurosci 4, 44 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Moyer CE, Shelton MA & Sweet RA Dendritic spine alterations in schizophrenia. Neurosci. Lett 601, 46–53 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Gandal MJ et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Maynard KR et al. dotdotdot: an automated approach to quantify multiplex single molecule fluorescent in situ hybridization (smFISH) images in complex tissues. Nucleic Acids Res. (2020). doi: 10.1093/nar/gkaa312 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Burgess DJ Spatial transcriptomics coming of age. Nat. Rev. Genet 20, 317 (2019). [DOI] [PubMed] [Google Scholar]
- 19.Lein E, Borm LE & Linnarsson S The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing. Science 358, 64–69 (2017). [DOI] [PubMed] [Google Scholar]
- 20.Jaffe AE et al. Profiling gene expression in the human dentate gyrus granule cell layer reveals insights into schizophrenia and its genetic risk. Nat. Neurosci 23, 510–519 (2020). [DOI] [PubMed] [Google Scholar]
- 21.Dong X et al. Global, integrated analysis of methylomes and transcriptomes from laser capture microdissected bronchial and alveolar cells in human lung. Epigenetics 13, 264–274 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.He Z et al. Comprehensive transcriptome analysis of neocortical layers in humans, chimpanzees and macaques. Nat. Neurosci 20, 886–895 (2017). [DOI] [PubMed] [Google Scholar]
- 23.Ståhl PL et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016). [DOI] [PubMed] [Google Scholar]
- 24.Rodriques SG et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science 363, 1463–1467 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Asp M et al. An Organ-Wide Gene Expression Atlas of the Developing Human Heart. SSRN Journal (2018). doi: 10.2139/ssrn.3219263 [DOI] [PubMed] [Google Scholar]
- 26.Moncada R et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol 38, 333–342 (2020). [DOI] [PubMed] [Google Scholar]
- 27.Berglund E et al. Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity. Nat. Commun 9, 2419 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Maniatis S et al. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science 364, 89–93 (2019). [DOI] [PubMed] [Google Scholar]
- 29.Gregory JM et al. Spatial transcriptomics identifies spatially dysregulated expression of GRM3 and USP47 in amyotrophic lateral sclerosis. Neuropathol. Appl. Neurobiol (2020). doi: 10.1111/nan.12597 [DOI] [PubMed] [Google Scholar]
- 30.Hafner A-S, Donlin-Asp PG, Leitch B, Herzog E & Schuman EM Local protein synthesis is a ubiquitous feature of neuronal pre- and postsynaptic compartments. Science 364, (2019). [DOI] [PubMed] [Google Scholar]
- 31.Molyneaux BJ, Arlotta P, Menezes JRL & Macklis JD Neuronal subtype specification in the cerebral cortex. Nat. Rev. Neurosci 8, 427–437 (2007). [DOI] [PubMed] [Google Scholar]
- 32.Zeng H et al. Large-scale cellular-resolution gene profiling in human neocortex reveals species-specific molecular signatures. Cell 149, 483–496 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Hawrylycz MJ et al. An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489, 391–399 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Birnbaum R, Jaffe AE, Hyde TM, Kleinman JE & Weinberger DR Prenatal expression patterns of genes associated with neuropsychiatric disorders. Am. J. Psychiatry 171, 758–767 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Abrahams BS et al. SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. Autism 4, 36 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Satterstrom FK et al. Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 180, 568–584.e23 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.de Leeuw CA, Mooij JM, Heskes T & Posthuma D MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol 11, e1004219 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Collado-Torres L et al. Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia. Neuron 103, 203–216.e8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Finucane HK et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet 47, 1228–1235 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Grove J et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet 51, 431–444 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pardiñas AF et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet 50, 381–389 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Stahl EA et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet 51, 793–803 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jaffe AE et al. Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis. Nat. Neurosci 21, 1117–1125 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Fromer M et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci 19, 1442–1453 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Vickovic S et al. High-definition spatial transcriptomics for in situ tissue profiling. Nat. Methods 16, 987–990 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Chen KH, Boettiger AN, Moffitt JR, Wang S & Zhuang X RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Codeluppi S et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat. Methods 15, 932–935 (2018). [DOI] [PubMed] [Google Scholar]
- 48.Overly CC, Rieff HI & Hollenbeck PJ Organelle motility and metabolism in axons vs dendrites of cultured hippocampal neurons. J. Cell Sci 109 ( Pt 5), 971–980 (1996). [DOI] [PubMed] [Google Scholar]
- 49.Harris JJ, Jolivet R & Attwell D Synaptic energy use and supply. Neuron 75, 762–777 (2012). [DOI] [PubMed] [Google Scholar]
- 50.Biever A, Donlin-Asp PG & Schuman EM Local translation in neuronal processes. Curr. Opin. Neurobiol 57, 141–148 (2019). [DOI] [PubMed] [Google Scholar]
Methods-only References
- 51.Lipska BK et al. Critical factors in gene expression in postmortem human brain: Focus on studies in schizophrenia. Biol. Psychiatry 60, 650–658 (2006). [DOI] [PubMed] [Google Scholar]
- 52.Dobin A et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Martin Morgan VO SummarizedExperiment. Bioconductor (2017). doi: 10.18129/b9.bioc.summarizedexperiment [DOI] [Google Scholar]
- 54.Lun Aaron [Aut C, Davide Risso. SingleCellExperiment. Bioconductor (2017). doi: 10.18129/b9.bioc.singlecellexperiment [DOI] [Google Scholar]
- 55.Lun ATL, McCarthy DJ & Marioni JC A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. [version 2; peer review: 3 approved, 2 approved with reservations]. F1000Res. 5, 2122 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.McCarthy DJ, Campbell KR, Lun ATL & Wills QF Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33, 1179–1186 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.van der Maaten L & Hinton G Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008). [Google Scholar]
- 58.McInnes L, Healy J, Saul N & Großberger L UMAP: uniform manifold approximation and projection. JOSS 3, 861 (2018). [Google Scholar]
- 59.Csardi G & Nepusz T The igraph software package for complex network research. InterJournal Complex Systems, 1695 (2006). [Google Scholar]
- 60.Collado-Torres L LieberInstitute/spatialLIBD: spatialLIBD: initial Bioconductor submission. Zenodo (2020). doi: 10.5281/zenodo.3689719 [DOI] [Google Scholar]
- 61.Chang W, Cheng J, Allaire JJ, Xie Y & McPherson J shiny: Web Application Framework for R. (2019). [Google Scholar]
- 62.Sievert C plotly for R. (2018). [Google Scholar]
- 63.Rajkowska G & Goldman-Rakic PS Cytoarchitectonic definition of prefrontal areas in the normal human cortex: I. Remapping of areas 9 and 46 using quantitative criteria. Cereb. Cortex 5, 307–322 (1995). [DOI] [PubMed] [Google Scholar]
- 64.Rajkowska G & Goldman-Rakic PS Cytoarchitectonic definition of prefrontal areas in the normal human cortex: II. Variability in locations of areas 9 and 46 and relationship to the Talairach Coordinate System. Cereb. Cortex 5, 323–337 (1995). [DOI] [PubMed] [Google Scholar]
- 65.Lun ATL & Marioni JC Overcoming confounding plate effects in differential expression analyses of single-cell RNA-seq data. Biostatistics 18, 451–464 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kang HM et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol 36, 89–94 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Crowell HL et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat. Commun 11, 6077 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ritchie ME et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Law CW, Chen Y, Shi W & Smyth GK voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Habib N et al. Massively parallel single-nucleus RNA-seq with DroNc-seq. Nat. Methods 14, 955–958 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Lacar B et al. Nuclear RNA-seq of single neurons reveals molecular signatures of activation. Nat. Commun 7, 11022 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Habib N et al. Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons. Science 353, 925–928 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Hu P et al. Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq. Mol. Cell 68, 1006–1015.e7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Amezquita RA et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17, 137–145 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Lun ATL et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol. 20, 63 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Griffiths JA, Richard AC, Bach K, Lun ATL & Marioni JC Detection and removal of barcode swapping in single-cell RNA-seq data. Nat. Commun 9, 2667 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Langfelder P, Zhang B & with contributions from Steve Horvath. dynamicTreeCut: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms. (2016). [Google Scholar]
- 78.Gusev A et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet 48, 245–252 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Olmos-Serrano JL et al. Down syndrome developmental brain transcriptome reveals defective oligodendrocyte differentiation and myelination. Neuron 89, 1208–1222 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Wray NR et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet 50, 668–681 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Bulik-Sullivan B et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet 47, 291–295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Rizzardi LF et al. Neuronal brain-region-specific DNA methylation and chromatin accessibility are associated with neuropsychiatric trait heritability. Nat. Neurosci 22, 307–316 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Svensson V, Teichmann SA & Stegle O SpatialDE: identification of spatially variable genes. Nat. Methods 15, 343–346 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Lun A BiocSingular. Bioconductor (2019). doi: 10.18129/b9.bioc.biocsingular [DOI] [Google Scholar]
- 85.Melville J uwot: R package. (2019). [Google Scholar]
- 86.Wickham H ggplot2 - Elegant Graphics for Data Analysis. (Springer International Publishing, 2016). doi: 10.1007/978-3-319-24277-4 [DOI] [Google Scholar]
- 87.Collado-Torres L, Weber L & Hicks S LieberInstitute/HumanPilot: Archive the HumanPilot code for the preprint version of our project. Zenodo (2020). doi: 10.5281/zenodo.3691916 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.