Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2023 Feb 16:2023.02.15.528663. [Version 2] doi: 10.1101/2023.02.15.528663

Massively parallel characterization of psychiatric disorder-associated and cell-type-specific regulatory elements in the developing human cortex

Chengyu Deng 1,2,, Sean Whalen 3,, Marilyn Steyert 4,5,6, Ryan Ziffra 1, Pawel F Przytycki 3, Fumitaka Inoue 7, Daniela A Pereira 1,2,8, Davide Capauto 9, Scott Norton 9, Flora M Vaccarino 9,10, Alex Pollen 11,12, Tomasz J Nowakowski 4,5,6,12,13, Nadav Ahituv 1,2,*, Katherine S Pollard 2,3,13,14,*
PMCID: PMC9949039  PMID: 36824845

Abstract

Nucleotide changes in gene regulatory elements are important determinants of neuronal development and disease. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 sequences, including differentially accessible cell-type specific regions in the developing cortex and single-nucleotide variants associated with psychiatric disorders. In primary cells, we identified 46,802 active enhancer sequences and 164 disorder-associated variants that significantly alter enhancer activity. Activity was comparable in organoids and primary cells, suggesting that organoids provide an adequate model for the developing cortex. Using deep learning, we decoded the sequence basis and upstream regulators of enhancer activity. This work establishes a comprehensive catalog of functional gene regulatory elements and variants in human neuronal development.

One Sentence Summary:

We identify 46,802 enhancers and 164 psychiatric disorder variants with regulatory effects in the developing cortex and organoids.

Introduction

Psychiatric disorders affect nearly one in five adolescents in the United States (1) and have a strong genetic etiology (2). Studies profiling gene expression across distinct anatomical regions have found an enrichment for psychiatric disorder-associated genes in stages of developmental neurogenesis in the marginal zone and deep cortical layer neurons (24). For example, most autism spectrum disorder (ASD) risk genes were found to regulate gene expression or be involved in neuronal communication during early brain development (5). Analysis of genes associated with schizophrenia via genome-wide association studies (GWAS) found several to be expressed in the prefrontal cortex at early developmental stages (6). Thus, decoding the genetic causes of psychiatric disorders requires deep knowledge of gene regulatory mechanisms in the developing brain.

In the past decade, hundreds of psychiatric disorder-associated genetic risk loci have been identified, both by work of individual labs and large consortia, including the Psychiatric Genomics Consortium (7, 8) and psychENCODE (9). A major portion of these loci reside in noncoding regions of the genome, likely within gene regulatory elements, and contain highly correlated variants due to linkage disequilibrium (LD), making them challenging to interpret and functionally characterize. Gene regulatory elements, such as enhancers and promoters, are known to regulate lineage- and region-specific transcription in the developing human cortex (10). While promoters are located adjacent to their target genes, enhancers can be located at distal locations from the gene(s) that they regulate. In addition, due to their cell-type specificity and spatiotemporal dynamic activity, enhancers are difficult to identify. Single-cell ATAC-seq (scATAC-seq) at different developmental stages of the human cortex enabled the identification of different cell populations and their candidate regulatory elements in the developing cortex (11, 12). This work also showed significant correlations between these cell types and cerebral organoids. However, these studies are descriptive and do not provide a functional readout that can test the enhancer activity of these sequences and the effects of psychiatric disorder-associated variants on their function.

Massively parallel reporter assays (MPRAs) enable the simultaneous testing of thousands of sequences for their regulatory activity (13). The quantitative readout of an MPRA makes it possible to test different alleles of the same locus side-by-side in one experiment, enabling the detection of sequence variants that alter enhancer function (1416). Because they generate data for large numbers of sequences, MPRAs have enabled the development of deep learning models that predict enhancers and their quantitative activity from sequence alone (17, 18). Machine learning has been used to predict: 1) enhancers from epigenetic marks, sequence motifs, and evolutionary conservation (reviewed in (19) ); 2) MPRA activity from sequences and epigenetic features (20); 3) gene expression from promoter-proximal sequences (2123); and 4) epigenetic features from sequences (2430). In addition, DNA sequence-based models (17, 18) have the potential to be deployed at scale to screen variants without MPRA data and to design enhancers with desired properties, such as cell type specificity. These strategies have shed light on the sequence motifs and upstream regulators that are most important for regulating gene expression across different cell types and species.

Here, we used deep learning and a lentivirus-based MPRA (lentiMPRA) to test 102,767 sequences for their enhancer activity in primary human mid-gestation cortical cells and 10-week cerebral organoids. These sequences included cell-type-specific open chromatin regions and psychiatric disorder-associated quantitative trait loci (QTLs) predicted to be functional enhancers based on their epigenetic profiles. Combined, we discovered 46,802 sequences to be functional enhancers and 164 variants with significant allelic differences in enhancer activity regulating known disorder-associated genes such as TBR1, MARK2 (autism spectrum disorder), and NFKB2 (schizophrenia). We observed comparable activity levels between organoids and primary cortical cells, suggesting that organoids provide an adequate in vitro model to study the developing cortical regulatory landscape. Finally, we used our lentiMPRA data to train a deep learning model that predicts enhancer activity from sequence with state-of-the-art accuracy, enabling us to learn sequence determinants and upstream regulators of the human mid-gestation enhancer code. These findings provide a comprehensive catalog of functional cortical-cell-specific enhancers and psychiatric disorder-associated variants that alter their activity, improving our understanding of the molecular basis of neurodevelopment and the etiology of psychiatric disorders.

Results

LentiMPRA library generation and analysis

To comprehensively characterize human neurodevelopmental enhancers and their sequence variants across the mid-gestation cortex, we designed two lentiMPRA libraries (Methods) and tested them in primary human cortical cells (Fig. 1A). Due to the limited number of obtainable human primary cells and lentivirus integrations into these cells, each library was assayed independently.

Fig. 1. Design and overall lentiMPRA results.

Fig. 1.

(A) Experimental overview of the two lentiMPRA libraries. Library 1 contains 48,861 differentially accessible (DA) regions from scATAC-seq in the developing human cortex that either overlap H3K27ac peaks or PLAC-seq/PCHiC loop. The number of DA candidates for each cell type was portrayed in the bar plot. dlEN, deep layer excitatory neuron; ulEN, upper layer excitatory neuron; IN-CGE, CGE derived interneuron; IN-MGE, MGE derived interneuron. Library 2 includes 17,069 brain QTLs that are 100kb from differentially expressed cross-disorder neurodevelopmental genes or in linkage disequilibrium (LD) with psychiatric disorder GWAS SNPs. The number of variants associated with each disorder is shown in the upset plot (top 22 intersects shown). SCZ, schizophrenia; ASD, autism spectrum disorder; BD, bipolar disorder; CDG, congenital disorders of glycosylation; AD, Alzheimer’s disease; MDD, major depressive disorder; ADHA, attention-deficit/hyperactivity disorder; TS, Tourette syndrome; OCD, obsessive-compulsive disorder. Both libraries were cloned into a lentiMPRA vector and packaged into lentivirus and used to infect primary cortical cells dissociated from GW18 tissues and human induced pluripotent cell (hiPSC)-derived cerebral organoids. Following infection, DNA and RNA were extracted and sequenced and an RNA/DNA barcode count ratio was calculated for each candidate regulatory sequence (CRS) allowing the identification of active DA regions and differentially active variants. (B) Correlation of log2(RNA/DNA) between technical replicates in primary cortical cells for library 1 and 2, respectively. (C) Pie charts showing the number of active and inactive sequences for candidates, positive (+) and negative (−) controls in both libraries. (D) Top enriched GO terms from the ‘Biological Process’, ‘Cellular Component’ and ‘Molecular Function’ ontologies for nearest genes of the highest activity sequences (both libraries combined). Closest genes of the lowest activity sequences were used as the background set. The complete list of GO terms is available in fig. S2B. (E) TF motif enrichment analysis for highest activity sequences (both libraries). Red: neurodevelopmental TFs, Blue: USFs.

The first library was designed to characterize cell-type specific enhancers. It consisted of 51,495 sequences obtained primarily from differentially accessible (DA) scATAC-seq peaks in the developing human brain (11). These DA peaks were further selected based on their: 1) overlap with histone H3 acetylation of lysine 27 (H3K27ac) peaks from bulk prefrontal cortex (PFC) tissue (31), microglia or non-microglia cells (11) (n = 24,611, 53%); or 2) overlap with H3K4me3 proximity ligation-assisted chromatin immunoprecipitation sequencing (PLAC-seq) peaks from intermediate progenitor cells (IPC), radial glia (RG), excitatory neurons (EN), or interneurons (IN) (32) (n = 12,412, 26.8%); or 3) overlap with promoter capture Hi-C (PCHi-C) from EN, hippocampal dentate gyrus (GE)-like neurons, lower motor neurons and astrocytes (33) (n = 13,712, 29.5%).

The second library compared the reference and the alternative alleles for 17,069 variants. These were designed from brain QTLs (3436) overlapping pseudo-bulked ATAC-seq peaks (11) that: 1) are within 100 kilobases (kb) of differentially expressed genes in schizophrenia, autism or bipolar disorder (expression QTL (eQTL) n = 14,021; chromatin QTL (caQTL) n = 149) (9); or 2) share a linkage disequilibrium (LD) block with GWAS SNPs for various psychiatric disorders (8, 3744) (eQTL n = 2,882, cQTL n = 17). Assaying both QTL and GWAS SNPs allowed us to overcome the systematic bias toward different types of variants in each type of study (45), thus enabling a comprehensive screening for functional regulatory variants. This library also included ~15,000 non-QTL sequences with a range of expected activity levels predicted from their epigenetic profiles.

To prioritize distal enhancers, promoter-overlapping peaks were excluded from both libraries. Each library contained 143 positive control sequences nominated from ATAC-seq and ChIP-seq data in brain organoid models (46) and used to define active sequences, plus ~2,000 additional controls used for quality control. We synthesized 270 base pair (bp) oligos, each centered on either the DA peak summit (library 1) or variant (library 2) followed by 15bp primers on either side to amplify the library. A 31bp minimal promoter (minP) and 15bp random barcode were placed downstream of each synthesized oligo via PCR and cloned into a lentiMPRA vector (Fig. 1A).

Both libraries were packaged into lentivirus and used to infect human primary cortical cells at mid-gestational week (GW) 18. Primary cortical cells were cultured for two days before infection, exhibiting complex morphology and smooth neurites. The presence of major cortical cell types was confirmed by immunocytochemistry before and after infection, including dee-player excitatory neurons (dlENs), upper-layer excitatory neurons (ulENs), newborn excitatory neurons (earlyENs), RG, IPC, astrocyte/oligodendrocyte precursors (Astro/oligo), endothelial and mural cells (EndoMural), medial ganglionic eminence derived interneurons (IN-MGE), caudal ganglionic eminence derived interneurons (IN-CGE), and microglia (MG) (fig. S1A). We performed three replicates for the DA library and five replicates for the variant library. Three days post-infection, when the majority of the non-integrated virus was gone, DNA and RNA were harvested and prepared for sequencing. DNA sequencing revealed that both libraries contained more than 96% of the designed oligos (DA library: 50,394 oligos; variant library: 51,319 oligos), and each oligo had on average over 50 unique barcode associations (median DA: 56, variant: 64). Overall, 97,762 sequences (95%) passed stringent quality control (Methods).

To measure enhancer activity, we quantified depth-normalized barcode abundance in DNA and RNA for each oligo and then calculated its batch-corrected RNA/DNA ratio. A high correlation of RNA/DNA ratios between replicates was observed (average Pearson correlation, DA: 0.93, variant: 0.91; Fig. 1B), confirming sufficient reproducibility. We next compared the activity distributions of positive and negative controls (fig. S2A). As expected, positive controls had significantly higher ratios than negative controls in both libraries (DA: p = 1e-3, variant: p = 8e-5, Wilcoxon test). Moreover, the distribution of ratios for randomly scrambled negative controls was highly comparable between libraries (median DA: 0.997, median variant: 0.994), indicating that activity was mostly driven by the minimal promoter. Altogether, these quality assessments suggest that our lentiMPRA robustly distinguishes between sequences with high, medium, and low regulatory activity.

To identify sequences capable of driving gene expression, we defined active sequences as those with RNA/DNA ratios higher than the median of positive controls in their respective libraries (DA: 1.047, variant:1.068), conservatively treating the remaining sequences as inactive. We further defined the highest activity sequences as those above the 75th percentile of the positive controls and the lowest activity sequences as those below the 25th percentile. Combining both libraries, we identified a total of 46,802 active sequences (48% of 97,762) and 25,557 with the highest activity. We next evaluated the various properties of active lentiMPRA sequences. Compared to inactive sequences, active sequences are significantly more conserved (p=5.8e-28, Wilcoxon test), and their target genes are expressed at higher levels during mid-gestation (p=6.4e-6, Wilcoxon test; 23% of all sequences mapped to target genes using PLAC-seq data (32)). Comparing the highest versus lowest activity sequences, we found gene ontology (GO) enrichment for neurodevelopmental terms, such as ‘nervous system development’ and ‘neuron differentiation’ (Fig. 1D). Next, we analyzed transcription factor binding sites (TFBSs) and observed enrichment for known neurodevelopmental TFBSs including the DLX, LHX and SOX gene families. We also found enrichment for universal stripe factors (USFs) including EGR1, MAZ and members of the KLF/SP family (Fig. 1E). USFs colocalize at most promoters and enhancers, increasing chromatin accessibility and residence time for cofactors (47), and our finding suggests that they play a similar role with lentiMPRA reporter constructs integrated into the genome. Together, these results indicate that our active sequences have biological functions in brain development.

Identification of thousands of active cortical-cell-type specific enhancers

Of 46,370 DA sequences passing quality control, 24,218 (52%) were active enhancers in primary cortical cells (Data S1). Among these active DA, 4,656 (19%) were transiently accessible regions across pseudotime of excitatory neurogenesis in a previous scATAC-seq study (11). We then separated DA sequences based on their cell-type specificity and found that for each cell type 43–62% of the sequences were active (Fig. 2A). We observed that abundant cell types, including neurons and radial glia, had a slightly higher percentage of active sequences compared to less abundant cell types, such as microglia and endothelial/mural cells. Compared to the lowest activity DA sequences, the highest activity DA sequences are enriched for TFBS of neurodevelopmental transcription factors, and many of these show positional enrichment within the ATAC-seq peak (Fig. 2B). For example, active sequences tend to have motifs for ATOH1, NEUROD2, and TCF4 upstream of the peak summit, whereas ASCL1 and SPI1 motifs are enriched downstream of the summit.

Fig. 2. Identification and validation of functional differentially accessible regions in the developing cortex.

Fig. 2.

(A) Upset plot showing the number of DA peaks (active: blue; inactive: gray) for each cell type or combination of cell types. (B) The highest activity DA sequences have positional motif enrichment for neurodevelopmental TFs compared to the lowest activity sequences, exhibiting significantly more motif matches slightly up- or downstream of the ATAC-seq peak summit. (C) Active DA sequences have significantly higher means across several attributes compared to inactive DA sequences (color scale: Wilcoxon test q-values, black = no data) including evolutionary conservation (phyloP), expression of PLAC-seq linked target genes in matched cell types, total number of strong motif matches (q-value < 0.01), and total number of strong USF motif matches. A representative motif enriched in active DA peaks for each cell type is shown on the right. Statistically significant comparisons (q-value < 0.05) are indicated by a star. (D) Experimental strategy for validating cell type specificity of active DA sequences. (E-F) Mid-gestation human cortex slice cultures transduced with a GFP lentivirus reporter driven by a ulEN-specific enhancer (chr5:89274678-89274948, hg38) (E) and a pan-excitatory neuron specific enhancer (chr2:165141999-165142269, hg38) (F). Expression of GFP (green) and SATB2 (red) was visualized via immunohistochemistry staining and insets show colocalization of GFP+ along with SATB2+ cells in different layers.

Next, we assessed the characteristics of active DA sequences across cell types (Fig. 2C and table S1). In almost all cell types, active DA sequences are more conserved than inactive sequences, consistent with prior knowledge that neurodevelopmental enhancers tend to exhibit strong conservation across vertebrate evolution (48). Inhibitory neurons derived from the ganglionic eminence exhibited the largest differences in conservation scores between active and inactive sequences, fitting with the general transitory role of the ganglionic eminence in guiding neuronal migration (49). Next, to test whether active DAs had regulatory activities endogenously, we predicted the target genes of each DA sequence using PLAC-seq data (32) and calculated cell-type-matched expression using scRNA-seq data in the developing human cortex (11). For many neuronal subtypes, genes interacting with active DAs showed higher expression compared to genes interacting with inactive DAs. For example, putative genes regulated by dIEN-specific active DAs were transcribed at a higher level in dIEN, compared to genes regulated by dlEN-specific inactive DAs (p = 0.00016, Wilcoxon test). We also found a significant higher number of strong TF motifs in active DAs specific to AstroOligo, EndoMural, Microglia and RG, while USF motifs were enriched in DAs specific to IN-CGE plus these same four glial and vascular cell types. These results indicate that the activity of DA sequences in lentiMPRA is associated with motif content and target gene expression in the matched cell type. Lastly, we observed unique sets of enriched motifs in active DAs of each cell type, and found many of them formed functional protein-protein interaction (PPI) networks (fig. S4).

To further verify the cell-type-specific activity of active DAs in our lentiMPRA, we selected eleven DA peaks with high MPRA activity from six different cell types (table S2) and tested them individually for their enhancer function in cortical cells followed by co-staining with antibodies for cell-specific markers (Fig. 2D). Each sequence was individually cloned into our lentiMPRA vector in front of a GFP reporter construct. As tissue samples are difficult to obtain at exactly GW18, we used cortical sections covering the mid-gestation period more generally (GW16-20) for infection. We first validated that the cortical sections can be uniformly infected by transducing them with a constitutively active enhancer vector, finding that our virus was able to infect with even spatial distribution across cell types (fig. S3). We next infected tissues with the individual enhancer viruses and found that all candidates showed GFP expression (fig. S3), in agreement with our lentiMPRA results. Cell-type specificity was inferred from GFP spatial location, counterstaining with cell markers, and morphology. While some candidates did not display strong cell-type-specific signals, we found three excitatory neuron-specific DA sequences (EN-1, ulEN-2, and dlEN-2) showing enhancer activity in the expected cell type. For example, a ulEN-specific DA region (chr5:89274678-89274948, hg38, Fig. 2E) drove GFP expression predominantly in the upper areas of the cortical plate and largely co-localized with SATB2, an upper layer excitatory neuron marker. Using PLAC-seq data (32), we found that this region has an EN-specific interaction with the promoter of MEF2C and MEF2C-AS1, known ASD and SCZ genes with EN-specific expression in the developing cortex. Another example is a pan-excitatory neuron specific accessible region (chr2:165141999-165142269, hg38, Fig. 2F) that showed notably higher GFP signal in the cortical plate (CP) and subplate (SP) compared to the ventricular zone (VZ) and outer subventricular zone (OSVZ), with most of the cells positive for GFP and SATB2 located in the top layer of CP.

Not all sequences showed enhancer activity within or unique to their predicted cell type. Two regions (ulEN1 and dlEN1) showed GFP expression outside the regions where the expected cell type is enriched and did not overlap with cell marker staining. Candidate sequences specific to Astro/Oligo or RG showed GFP signal around the VZ but also near the CP. In these cases, GFP+ cells showed complex morphologies: some matched with the expected cell type(s) while others did not (fig. S3). To conclude, we independently validated the enhancer activities of eleven sequences with high lentiMPRA activity, finding all of them to drive GFP expression in cortical cells, with three exhibiting cell-type-specificity consistent with predicted activity from scATAC-seq.

lentiMPRA identifies functional regulatory variants associated with psychiatric disorders

We next characterized the psychiatric disorder-associated variant library. Of the 15,335 variants with both alleles passing quality control, 8,029 (52%) showed enhancer activity from at least one allele. For these active variants, we estimated the allelic effect on enhancer activity by testing for differential activity across replicates. Most variants had modest effect sizes (median absolute log2 FC = 0.069) (Fig. 3A). This is in line with previous MPRAs studying the effect of regulatory variants (50, 51), and consistent with eQTL analyses that find ~1% of single nucleotide changes are associated with significant changes in gene expression (52). At a 10% false discovery rate, 164 variants showed significant allelic effects with the number of down-regulating and up-regulating variants being similar (51% versus 49%, p = 0.81, Binomial test, Fig. 3B) (53). Our subsequent analyses focus on these 164 differentially active variants (DAVs) (Fig. 3B, Data S2). Among DAVs, 26 were in LD with GWAS SNPs and 138 were within 100kb of differentially expressed disease genes, which is similar to expectation given the library design (17% GWAS and 83% eQTL). Consistent with being QTLs, DAVs are not enriched for low-frequency variants (OR=0.8, p=0.34) nor do they have elevated conservation (OR=0.88, p=0.52). Separating DAVs based on cell type showed enrichment in Astro/Oligo (OR=2.39, p=0.14, Fig. 3C), which is particularly striking given the relatively low proportion of this cell type in our cultures and indicates the importance of astrocytes and oligodendrocytes in psychiatric disorders.

Fig. 3. Identification of differential active variants associated with psychiatric disorders.

Fig. 3.

(A) Volcano plot showing log2 Fold Change and −log10 adjusted p-value for variants that have enhancer activity from at least one allele. Significant variants (FDR < 0.1) were annotated with PLAC-seq predicted target gene name and color-coded based on target gene expression in mid-gestation telencephalon. Two vertical dashed lines indicate the absolute log2FC of 1. The horizontal dashed line indicates FDR at 10%. (B) Upset plot showing the number of variants (bar) passing combinations of different thresholds (dots and lines below bar). The number of DAVs was highlighted in red. (C) Enrichment log2 odds ratio of DAVs overlapping different features, including combined or separate cell-type-specific DA regions, adult brain eQTL, GWAS of various psychiatric disorders and low-frequency variants with minor allele frequency (MAF) less than 0.01. (D) TFBSs predicted to be altered by DAVs using motifbreakR. Dot color represents TF expression in primary cortical cells, size represents predicted magnitude of binding affinity alternation. TFs were ranked by TFBS alternation significance (−log10p-value, y-axis). (E-F) Genomic browser tracks showing examples of causal regulatory variants and their predicted target genes. The top track shows PLAC-seq chromatin loop in EN (32), the second track shows bulk RNA-seq in primary cortical cells, the third track shows bulk H3K27ac ChIP-seq (31), followed by a track of bulk ATAC-seq in deep-layer cortex (31). The bottom ten tracks show scATAC-seq in human cortex (11). DAV rs2193495 (E), located in a dlEN-specific accessible region, potentially down-regulates TBR1 expression due to the introduction of EOMES and MAZ binding sites. DAV rs2154984 (F) is predicted to regulate MARK2 expression and disrupt MTF1 and ZNF148 and introduce PPARD and NR2F6 binding sites. (G) Manhattan plot of SCZ-associated chromosome band 6p21.2 showing the 38 variants tested. The y-axis shows −log10 of adjusted p-value from MPRA. DAVs are highlighted in red and annotated with their predicted target gene. Arrows indicate the direction of allele effect observed in MPRA. (H) DAV rs9368977 located in 6p21.2 is predicted to disrupt binding of SP3, SP1, KLF4 and EGR4. (I) Manhattan plot of ASD-associated chromosome band 16p11.2 showing the 25 variants tested. The y-axis shows −log10 of adjusted p-value from MPRA. DAVs are highlighted in red and annotated with predicted target genes. The arrow indicates the direction of allele effect observed in MPRA. (J) TFBS altered in rs145650870. The alternative allele favors the binding of EHF and ELK3.

Next, we compared our DAVs to prior studies. A recent MPRA for dementia-associated variants in human embryonic kidney cells (HEK293T) (51) included 96 variants that were also in our library. We found that 89 variants show no significant allelic effect in either study, and 7 altered enhancer activity in HEK293T cells but not in our primary cortical cell data. This difference could be due to the cell types and/or the thresholds used to assign differential activity. Comparing our DAVs to eQTL data from psychENCODE (34), we found that 55% of DAVs (n = 77) had effects in the same direction as the eQTL and effect sizes were weakly correlated (Pearson’s correlation = 0.14, p = 0.102) (fig. S5A). Despite this, the correlation between MPRA and eQTL in non-DAVs (Pearson’s r = 0.008, p = 0.366) was notably lower than that in DAVs (fig. S5A). This corroborates that our lentiMPRA is efficient in identifying functional variants while underscoring differences between reporter activity and endogenous gene expression.

To decode the mechanisms through which the 164 DAVs exhibit differential activity, we predicted losses and gains of TFBS motifs using motifbreakR (54) (threshold = 1e-5), identifying 34 DAVs (21%) in which the alternative allele alters at least one TFBS (Fig. 3D). We also found that there is significantly more TFBS disruption compared with non-DAVs (OR = 1.49, p = 0.047, Fisher’s exact test). We then analyzed whether these predicted disrupted TFs functionally or physically interact with each other using the STRING database (55) and found a significant TF network centering on SOX2 and STAT3 (PPI enrichment p < 1e-16, fig. S5C).

We next predicted the putative target gene/s of DAVs using chromatin interaction data in various brain cell types (32, 33, 56) and adult brain eQTLs (34, 36)(Fig. 3A), finding 48 DAVs (29.3%) to have chromatin loops with gene promoters and 8 of these (17%) to be eQTLs for the interacting gene. As regulatory activities vary over development, target genes predicted using adult brain eQTLs may not reflect genes regulated in early brain development, and thus we prioritized target genes predicted from chromatin interaction data. Many target genes are known risk genes or within susceptibility loci for psychiatric disorders and neural diseases. For example, we found that variant rs2193495, located in a dlEN-specific DA region that is thought to regulate the expression of T-box brain transcription factor 1 (TBR1), an ASD haploinsufficient associated gene and neurodevelopmental regulator (57), leads to reduced MPRA activity, possibly due to the creation of EOMES and MAZ binding sites (Fig. 3E). Another down-regulating variant, rs2154984, resides in a putative enhancer of the microtubule affinity regulating kinase 2 (MARK2), a risk gene whose loss-of-function variants have been associated with ASD (58) (Fig. 3F). This variant decreases the affinity of MTF1 and ZNF148 binding sites while increasing the affinity of PPARD and NR2F6 binding sites (Fig. 3F). Another example includes SCZ-associated variant rs10786689 that is thought to regulate nuclear factor kappa B subunit 2 (NFKB2) and suppressor of fused homolog (SUFU). This variant decreases enhancer activity, possibly due to the disruption of a SOX2 and/or SOX4 TFBS (fig. S5D). Since both genes were found to be up-regulated in SCZ patients (59, 60), our results suggest that the alternative allele of rs10786689 could be protective. Finally, rs73392121 resides within a microglia scATAC peak and is thought to regulate NPC intracellular cholesterol transporter 1 (NPC1), a known cause of Niemann-Pick disease type C, with the alternative allele leading to reduced MPRA activity. Mutations in this gene lead to impaired cholesterol and lipid cellular transport, including microgliosis (61). These findings demonstrate that lentiMPRA can nominate candidate causal variants for known disease genes.

As a second strategy for linking DAVs to psychiatric disorders, we focused on known risk loci with multiple variants tested in our lentiMPRA. For example, in the SCZ-associated region 6p21.2 (62) (Fig. 4G), we tested 38 variants and found 2 DAVs: rs6912602 and rs9368977. DAV rs6912602 is one of the most differentially active in our lentiMPRA (3.3-fold decrease) and is an eQTL associated with reduced expression of peptidylprolyl isomerase like 1 (PPIL1). Partial loss-of-function variants in PPIL1 cause neurodegenerative pontocerebellar hypoplasia in humans and mice (63). DAV rs9368977 increases activity in lentiMPRA, resides in open chromatin in RG and IPC, and is an eQTL for chromosome 6 open reading frame 89 (C6orf89). The alternative allele of rs9368977 disrupts the motifs of USFs SP3 and KLF4 (Fig. 4H). In the SCZ-associated locus 6p21.1 (64), we tested 48 variants and identified one DAV: rs1343025. The alternative allele of rs1343025 is associated with increased expression of vascular endothelial growth factor A (VEGFA). VEGFA regulates cerebral blood volume and is associated with SCZ, though the exact impact of VEGFA remains controversial (65). Another example, is the ASD risk loci is 16p11.2 (66), where we tested 25 variants and discovered one activity-increasing DAV, rs145650870 (Fig. 4I). This variant is located in an RG-specific chromatin loop for three nearby genes: tu translation elongation factor (TUFM), ataxin type 2 (ATXN2L) and SH2B adaptor protein 1 (SH2B1). The alternative allele of rs145650870 creates a TFBS for EHF (Fig. 4J). Combined, these results show that our lentiMPRA approach could be used to prioritize variants that have an effect on regulatory activity in disease-associated loci.

Fig. 4. Comparison of lentiMPRA results in cerebral organoids and primary cortical cells.

Fig. 4.

(A) Schematic of the experimental workflow. (B). Microscopic images of 10-week-old organoid slices immunostained for SOX2 (Cyan), FOXG1 (Red), and DAPI. Scale bar, 200 μm. (C) Normalized transcript count of marker genes in organoids derived from 3 hiPSC lines (1 = 21792A, 2 = 1323_4, 3 = 20961B). (D) Correlation of log2(RNA/DNA) between replicates in organoids and primary cortical cells for library 1 and 2, respectively. (E-F) Venn diagrams showing the overlap between organoids and primary cells. (E) Left: overlap of active DA regions; Right: overlap of active variants. (F) Overlap of DAVs. ‘Top DAVs’ were identified using shuffled sequences to define active and applying a cutoff for absolute log2FC of 0.3. (G) The proportion of active DAs in organoids that are also active in primary cells. (H) lentiMPRA log2FC in organoid (x-axis) and primary cells (y-axis). The scatter plot includes variants identified as DAVs in both organoids and primary cells (grey), variants detected as DAVs only in organoids (red), and variants detected as DAV only in primary cells (blue). (I) Left: Protein-protein interactions (PPI) of enriched TFBS motifs in active DAs specific to organoids or primary cells. PPI network generated using STRING (73) database. Right: heatmap showing the normalized transcript count of enriched TFs from bulk RNA-seq data. TFs not expressed (TPM < 1) in all replicates were removed from the heatmap. (J-K) A DAV (chr15:72984155-72984425, hg38) that contains a BCL6 motif showed increased activity in organoids versus primary cells (J) and its reference sequence contains a BCL6 motif (K). (L-M) A DAV (chr2: 209451505-209451775, hg38) with GLIS3 binding motif showing increased MPRA activity in primary cells versus organoids (L) and the location of the GLIS3 motif in its reference sequence shown below (M). (N) TFBSs altered by DAVs that show an opposite direction of allelic effect between organoids and primary cells. Dot sizes represent normalized TF expression; color represents log2FC.

Organoids show comparable lentiMPRA activity to primary cells

Previous single-cell transcriptomic and epigenomic data along with immunohistochemical analyses suggest that cortical organoids recapitulate many of the cell types in the developing human forebrain (11, 46, 6769). To explore the suitability of organoids as a tissue source for MPRAs, we tested both our lentiMPRA libraries in 10-week-old cortical organoids (Fig. 4A), which were validated for the expression of relevant cell type markers, such as FOXG1, PAX6, EOMES, LHX2, via immunostaining (Fig. 4B) and bulk RNA-seq (Fig. 4C). Following nine weeks of directed differentiation towards a dorsal forebrain fate, organoids were sectioned into 300-μm-thick slices and infected with the lentiMPRA libraries at 10 weeks. This slicing approach allowed diffusion of lentivirus into the majority of cells, providing high integration rates per cell (Multiplicity of infection (MOI) = 100). Slicing is also known to attenuate hypoxia, leading to better organoid cell health (70). For each library, we infected organoids derived from 2–3 iPSC lines with 2–4 technical replicates each and analyzed the data as described above for primary cells. We observed a high correlation between lentiMPRA replicates (average Pearson correlation over all tested sequences in each library, DA: 0.89, variant: 0.90) and positive controls consistently showed higher enhancer activity compared to negative controls (DA: p = 6.6e-4; variant: p = 0.027, Wilcoxon test, fig. S2A), confirming the high quality of our organoid data.

We compared RNA/DNA ratios between organoids and primary cortical cells and observed high correlation for both libraries (average Pearson correlation DA: 0.89, variant: 0.87, Fig. 4D). Similar to primary cells, roughly half of tested sequences were active (total: 31,954, DA: 23,832, variant: 8,122). The vast majority of organoid active sequences were also active in primary cells (Fig. 4E). To put this high level of concordance in the context of gene regulation, we performed bulk RNA-seq on three primary and three organoid samples (average replicate Pearson correlation, primary: 0.98, organoid: 0.99) and observed similar transcript levels between primary and organoid samples (average Pearson correlation 0.88), with some notable exceptions that we discuss below. Finally, we compared the activity of DA sequences stratified by the cell types in which they are accessible and found that active DA sequences were highly concordant in organoids versus primary cells for each cell type (Fig. 4G). The two cell types in which the lowest proportion of primary cell active DAs replicated in organoids were microglia (86.1%) and endothelial cells (86.4%), which is expected since these cell types are thought to be absent in cerebral organoids and our ability to assay activity of these DAs relies upon the permissiveness of MPRAs. These results suggest that cerebral organoids are a reasonable in vitro model of mid-gestation enhancer activity and gene expression, despite some differences in cell type composition.

Next, we examined the concordance of differential allelic activity between organoids and primary cells. In organoids, we observed a median absolute log2AR of 0.066, similar to that in primary cells (0.069), and detected 420 DAVs (FDR<10%), of which 74 (18% of organoid DAVs, 45% of primary DAVs) were also DAVs in primary cells (Fig. 4F). The larger number of DAVs identified in organoids is due to additional replicates and smaller batch effects. Consistent with this, the overlap of DAV sets was higher (53% of organoid DAVs, 54% of primary DAVs) when considering only the most differentially active organoid variants (absolute log2AR > 0.3 and activity above the median of shuffled controls, Fig. 4F). Despite this modest concordance in which variants were statistically significant, we observed a high correlation in DAV effect sizes in organoids versus primary cells (r = 0.91, p = 2.2e-16, Fig. 4H). We conclude that cerebral organoids and primary cells produce comparable lentiMPRA measurements of differential allelic activity for variants with the largest effects, with noise and cell type differences affecting measurements at and below the significance threshold for identifying DAVs.

Beyond evaluating the suitability of organoids for modeling primary cells, we were also curious about what differences in lentiMPRA results between the two settings would reveal about the biology of early neurodevelopment. Focusing first on the 2,298 DA sequences that were active only in organoids and 2,684 only in primary cells, we performed motif enrichment analysis and examined the expression level of enriched TFs (Fig. 4I). Organoid-specific active DA sequences were enriched for binding sites of NKX2.1, RUNX, BCL6, and ASCL2. BCL6 is a transcriptional repressor with significantly lower expression in organoids compared to primary cells (q-value = 8.86e-7), consistent with our observation that sequences harboring BCL6 motifs tend to have higher lentiMPRA activity in organoids. One such example, includes a dlEN-specific accessible region containing a BCL6 motif that had significantly higher enhancer activity in organoids (Fig. 4JK). In addition, overexpression of BCL6 is known to inhibit apoptosis (71) and therefore could reflect elevated cell stress in organoids (72). For the primary-specific active DA peaks, we observed enrichment for GLIS3, STAT6, EHF, and HNF1B motifs. Compared to primary cells, organoids showed higher GLIS3 expression (q-value = 9.24e-7) and we observed higher lentiMPRA activity in primary cells versus organoids for an Astro/Oligo and IN-MGE DA region containing a GLIS3 motif, suggesting that it may be functioning as a repressor in these primary-specific active sequences (Fig. 4LM). Thus, motif analysis helped us identify TFs whose differential expression between primary cells and organoids is associated with significant shifts in enhancer activity, suggesting repressor versus activator roles for these TFs and underscoring their importance in regulating neurodevelopment.

Next, we examined variants identified as DAVs only in organoids or primary cells. As most variants still showed highly comparable effect sizes in both, we focused on 61 variants showing an opposite direction of effect. We predicted TFBS losses and gains using motifbreakR and compared TF expression using bulk RNA-seq (Fig. 4N). Twenty-eight variants were predicted to alter TFBS and 50% of altered TFs showed differential expression between organoid and primary. For example, we found that variant rs112049982 increased enhancer activity in primary cells but decreased activity in organoids. rs112049982 was predicted to improve the binding affinity of OLIG2, a maker gene for oligodendrocytes (Oligo) and oligodendrocyte precursor cells (OPC), which showed significantly lower expression in organoids. This agrees with prior knowledge that Oligo and OPC are extremely rare populations in cerebral organoids (11). Thus, we speculate that the discordant effect of the alternative allele of rs112049982 in organoids versus primary cells could be due to OLIG2’s differential expression. Together, these results indicate that despite organoids being a suitable in vitro model, differences in the trans-regulating environment should be carefully examined when interpreting lentiMPRA results.

A sequence-based deep learning model of lentiMPRA activity reveals neurodevelopmental enhancer motif grammar

Our large dataset of lentiMPRA measurements provides an opportunity to characterize the mid-gestation enhancer code by modeling enhancer activity and then decoding the model’s understanding of how sequence variants modulate activity. To do this, we first designed a deep learning regression model that combines a single convolutional layer to learn motif-like sequence features, followed by two recurrent layers to learn the position, spacing, and orientation of motifs (Methods). Sequences were one-hot encoded into matrices (270bp × 4 nucleotides per sequence), and the mean RNA/DNA ratio across replicates was used as the regression target variable. For each library, we trained a model on sequences from all chromosomes except chromosome 3 (used as a validation set to prevent overfitting during training) and chromosome 4 (held out completely for an independent measure of predictive performance). Controls were included in model training. The variant library also included 15,000 sequences that represent a range of expected activity levels due to varying epigenetic similarity to validated brain enhancers in the VISTA database (74). On chromosome 4, the DA and variant models achieved 0.82 and 0.78 Pearson correlation, respectively (Fig. 5A; 0.81 and 0.7 Spearman correlation). Other similar sequence-to-activity models include MPRA-DragoNN (75) trained on human HepG2 and K562 Sharpr-MPRA data (0.28 Spearman correlation), and DeepSTARR (18) trained on fruit fly STARR-seq data (0.68 Pearson correlation). Though direct comparisons are not possible due to vast differences in assay type and dataset quality, our held-out predictive performance suggests our model is learning relevant sequence features for predicting MPRA activity.

Fig. 5. Sequence determinants of lentiMPRA activity can be modeled with deep learning.

Fig. 5.

For each library, we trained a deep learning model to predict lentiMPRA activity in primary cells from sequence alone. (A) Sequences on chromosome 4 were held out from model training and used to evaluate model performance. Predicted and measured activity have high Pearson correlation for the DA library (left) and variant library (right). (B) The model learned motifs of neurodevelopmental TFs and used them for accurate predictions. Predictive importance of convolutional filters (change in sum of squared errors when fixing filter output to zero) is plotted against significance of matches to HOCOMOCO motifs (TOMTOM q-value < 0.1) for TFs expressed in mid-gestation telencephalon (mean CPM > 1). (C) Applying ISM to the variant library, we found that the activity of most enhancers can be tuned up and down through introduction of alternative alleles. The largest activity-increasing and activity-decreasing alleles for each sequence (purple) tend to have bigger effects than the lentiMPRA measured effects for QTLs (yellow). (D) We combined ISM with motifbreakR TFBS disruption scores to screen TFs for repressor versus activator function in neurodevelopment, using the most activity-increasing alternative allele for each sequence in the variant library. TF’s where predicted activity is anti-correlated with motif score tend to repress expression (top) and those with a positive correlation tend to be known activators (bottom). This relationship can be used to decode if the model has learned an activator versus repressor role for TFs that function in both ways. (E) The reference T allele of eQTL rs2883420 (lentiMPRA RNA/DNA 0.8) matches motifs of repressors SRY and SOX2, while the alternate C allele disrupts a high information content position in both motifs, resulting in a large activity increase (lentiMPRA RNA/DNA = 0.97, predicted RNA/DNA = 0.96). (F) ISM predicts that the other two possible alleles at rs2883420 also increase activity (middle, sequence logo indicates magnitude and direction: up=increasing, down=decreasing). Alternative alleles at adjacent nucleotides overlapping TF motifs (top, positive strand = black, negative strand = gray) have even larger predicted effects on activity. Region shown is chr10:86,851,230–86,851,500 (hg38).

Convolutional neural networks learn de novo filters from DNA sequences that represent position-specific nucleotide frequencies, similar to TFBS motifs. We therefore used the set of sequences that strongly activate each filter to construct a position weight matrix (PWM) and compared these against the HOCOMOCO (v11) database (76) to identify significant matches to know binding site motifs (Methods). As many filters have significant matches to motifs (q-value < 0.1) for TFs that are expressed in mid-gestation telencephalon (mean TPM > 1), we estimated each filter’s importance for predicting lentiMPRA activity by setting its output to zero and quantifying how much model performance decreases (deltaSSE; Methods). Top-ranked filters that match TFs with high mid-gestation telencephalon expression included TEAD1, NFATC1, STAT3, FOXJ3, POU2F1, and BCL11A (Fig. 5B). In addition to these TFs, our method also highlighted several USFs that function as cofactors to improve chromatin accessibility (47), consistent with our finding that motifs for these TFs are enriched in active versus inactive sequences (Fig. 1D). Thus, our model learned that both universal co-factors and TFs specific to the physiological context of our lentiMPRA experiments contribute to enhancer activity.

To complement this analysis, we performed a large-scale in silico mutagenesis (ISM) study. This method enabled us to quantify how individual nucleotide variants affect model predictions and did not directly rely upon a PWM database, though we did use PWM similarity to interpret high-scoring variants. Specifically, we constructed sequences with each possible alternate base at each of the 270 positions in each of the 17,069 variant-containing oligos for a total of 18.4 million alleles. We then predicted the activity of each alternate allele. Examining the distribution of the largest predicted activity change (up or down) per oligo, we found that while the QTLs tested in our lentiMPRA generally have moderate effects on activity, many of the adjacent synthetic variants tested with ISM have larger effects (Fig. 5C). For 11.6% of oligos, predicted activity can be increased by 50% or more through a single nucleotide change. Conversely, 19.7% of oligos can be reduced by 50% or more through a single change. As expected, activity-increasing variants frequently create binding sites for transcriptional activators (e.g., CEBPD) or mutate binding sites for repressors (e.g., FOXK1) that are expressed at mid-gestation, while activity-decreasing variants do the opposite (Fig. 5DE). All sequences contained both increasing and decreasing alleles, and in most cases the two variants with the largest absolute ISM scores have opposing effects on activity. At nucleotides with large absolute ISM scores, the three alternative alleles tend to all be increasing or decreasing as expected if the reference base is a high information content position in a TFBS (Fig. 5F).

As an example, we highlight the region around eQTL rs2883420 (Fig. 5EF) that has strong matches for SRY-like motifs. ISM predicts that all three alternative alleles at rs2883420 increase activity (predicted RNA/DNA ~ 0.97). In lentiMPRA, the reference allele was inactive in lentiMPRA (RNA/DNA ~ 0.8), while the alternative allele made the sequence nearly active (RNA/DNA ~ 0.96), fitting with our prediction. Further examination of the sequence effects of this eQTL (Fig. 5F), found a strong disruption of motifs for repression-capable TFs, such as SOX2 (77) and SRY (78) (p-value < 1e-4). ISM also predicts increased activity for non-reference alleles in a TFBS-sized region surrounding rs2883420, and most of these have larger effects than the eQTL, consistent with our genome-wide observations (Fig. 5C). These findings indicate that our model is learning de novo PWM-like representations of TF motifs which together form a neurodevelopmental regulatory grammar. Such a model can be leveraged to perform ISM, revealing how variants not present in an MPRA library alter enhancer activity and transcription factor binding. This strategy could be extended to discover and design cell-type specific enhancers with precisely tuned activity levels.

Discussion

Gene regulatory elements have a major impact on human brain development and neurodevelopmental disease. Here, we combined lentiMPRA and deep learning to annotate thousands of regulatory elements in the developing cortex and cerebral organoids. This work provides a large catalog of functional human brain developmental enhancers and disease-associated variants, along with deep learning models that can accurately predict cell-type-specific regulatory regions and variant effects. In addition, it showcases the usability of cerebral organoids as an in vitro model for testing regulatory activity in mid-gestation, but also highlights several differences in the trans-regulating environment that should be taken into account when interpreting these results.

One caveat of our lentiMPRA is that it is a bulk assay with limited capability to detect cell-type specific signals. For example, we found that abundant cell types, such as neurons and radial glia, had higher percentages of active cell-type specific DA sequences compared to rarer cell populations. For microglia in particular, this could also be due to it being a difficult cell type to infect with lentivirus (79), leading to its lower active DA percentage (43.9%). Our validation of eleven regions for cell-type-specific activities in developing brain tissues identified three excitatory neuron-specific enhancers showing expected cell-type specificity, while the rest were active but non-specific. Conversely, we observed lentiMPRA activity for some microglia- and endothelial-specific DA sequences in organoids, despite these cell types being absent or very rare (11). We hypothesize this is due to sequences being activated by TFs present in other cell types that do not activate the endogenous sequence due to repressive chromatin. Beyond enhancer assays being permissive and testing sequences outside their endogenous context, other factors that may contribute to unexpected findings for sequences with cell-type specific chromatin accessibility include the limited resolution of scATAC-seq and differences in developmental stages. Nonetheless, nearly half of the 48,861 cell-type specific open chromatin regions that we tested were active enhancers in primary cells and/or organoids.

The cerebral organoids we generated from three hiPSC lines produced lentiMPRA measurements of enhancer activity that were highly correlated with measurements for the same sequences in primary cortical cells. While differential allelic activity was highly correlated for the variants with the largest effects, at least half of the DAVs identified in organoids or primary cells were not statistically significant in the other context with some having opposite allelic effects. We mostly attribute this to differences in sample sizes and cell type proportions. However, we also found that discordant results between organoids and primary cells could shed light on differences in the cellular environment between these two contexts. We identified BCL6 and GLIS3 as TFs whose differential expression in primary cells versus organoids can explain why sequences containing their TFBS motifs show significantly differential activity in lentiMPRAs. By looking at whether motifs are positively or negatively correlated with activity, both this analysis and our deep learning-based ISM analysis showed how lentiMPRA data can be used to infer if TFs are acting as repressors versus activators. These computational inferences are needed, because many TFs have both repressive and activating functions (e.g.(8082)), and neurodevelopmental enhancers can play both activating and repressing roles depending on the bound TFs (46).

Another clear signal in our computational analyses was the importance of USFs for enhancer activity in early neurodevelopment. Stripe TFs are known to function as transcriptional co-factors improving chromatin accessibility (47). Motif enrichment in our lentiMPRA active sequences identified universal stripe factors alongside many known neurodevelopmental TFs. Corroborating this finding, analysis of our machine learning models identified many predictive features matching stripe motifs. These results suggest that stripe TF motifs may be useful for boosting the activity of designed enhancers, while they might reduce or overshadow signals from cell-type specific TFs. Investigations that study the role of universal stripe factors in MPRAs and in vivo will be an important direction for future studies.

We evaluated the regulatory effect of 17,069 brain QTLs and psychiatric disorder associated variants, identifying 164 differentially active variants. This small number is in line with other MPRAs that tested the effect of single-nucleotide variants (14, 51, 53, 83, 84) observing relatively small effects of single nucleotide substitutions, especially common alleles, on regulatory activity. Our deep learning model supports this conclusion; it predicts that many nucleotide changes in the open chromatin regions we tested, including alleles never or rarely seen in people, would show greater differential activity than the brain QTLs did. Another contributing factor is that regulatory variants, unlike coding variants, impact different layers of transcriptional regulation. MRPAs detect variants affecting enhancer activity or perhaps TF binding (85) but not those that modulate chromatin states, genome folding, splicing, or other aspects of gene expression. Finally, since about half of the DAVs we detected are in cell-type specific open chromatin regions, we expect that performing lentiMPRA on mixed cell populations limits detection of allelic effects that vary across cellular contexts.

Despite detecting only 164 high-confidence DAVs, integrative analysis of our data with publicly available chromatin interaction data linked many of these DAVs to one or more target genes expressed in neurodevelopment. Predicted target genes of many DAVs are known risk genes or within susceptibility loci, such as TBR1 and MARK2 for ASD or NFKB2 and SUFU for SCZ. In particular, for large psychiatric disorder associated loci, our results for 6p21.1, 6p21.2 and 16p11.2 showcase the utility of lentiMPRA to identify potential disorder-associated regulatory variants in a high-throughput manner. In summary, we nominated several differentially active QTLs as potential causal variants of known disease genes/loci, paving the way for developing novel genetic diagnostic and therapeutic tools.

Overall, our work strengthens the utility of using primary cell culture, organoids, and MPRAs to investigate regulatory elements and variants involved in human brain development. In future work, it would be interesting to utilize an organoid lentiMPRA approach to test libraries from various psychiatric disorder-derived iPSCs to identify donor-specific trans effects on regulatory activity. iPSC derived organoids are also a promising avenue for investigating brain development in non-human primates, including comparative studies. Another technological development that could be used to expand upon this study is single-cell MPRA (86, 87). While currently limited to a small number of sequences, this approach could eventually overcome some limitations we faced testing cell-type specific DAs in a bulk assay. It will also be critical to leverage CRISPR screens to assess the endogenous activity of candidate enhancer sequences, including those validated for activity with MPRAs. While CRISPR-based tools will likely improve our understanding of regulatory element cell-type specificity, they have their own caveats such as the need for high effect sizes on target gene mRNA levels. Deciphering the regulatory code of human brain development will require integration of all these strategies. The datasets and models generated in this work are a step in that direction.

Supplementary Material

Supplement 1
media-1.pdf (2.9MB, pdf)

Acknowledgments

General: We would like to thank members of the Ahituv, Nowakowski, Pollen and Pollard labs for assistance with this manuscript. We would like to thank Kevin White and Sophia Gaynor for sharing control sets of MPRA. Data were generated as part of the PsychENCODE Consortium, supported by: U01DA048279, U01MH103339, U01MH103340, U01MH103346, U01MH103365, U01MH103392, U01MH116438, U01MH116441, U01MH116442, U01MH116488, U01MH116489, U01MH116492, U01MH122590, U01MH122591, U01MH122592, U01MH122849, U01MH122678, U01MH122681, U01MH116487, U01MH122509, R01MH094714, R01MH105472, R01MH105898, R01MH109677, R01MH109715, R01MH110905, R01MH110920, R01MH110921, R01MH110926, R01MH110927, R01MH110928, R01MH111721, R01MH117291, R01MH117292, R01MH117293, R21MH102791, R21MH103877, R21MH105853, R21MH105881, R21MH109956, R56MH114899, R56MH114901, R56MH114911, R01MH125516, R01MH126459, R01MH129301, R01MH126393, R01MH121521, R01MH116529, R01MH129817, R01MH117406, and P50MH106934 awarded to: Alexej Abyzov, Nadav Ahituv, Schahram Akbarian, Kristin Brennand, Andrew Chess, Gregory Cooper, Gregory Crawford, Stella Dracheva, Peggy Farnham, Michael Gandal, Mark Gerstein, Daniel Geschwind, Fernando Goes, Joachim F. Hallmayer, Vahram Haroutunian, Thomas M. Hyde, Andrew Jaffe, Peng Jin, Manolis Kellis, Joel Kleinman, James A. Knowles, Arnold Kriegstein, Chunyu Liu, Christopher E. Mason, Keri Martinowich, Eran Mukamel, Richard Myers, Charles Nemeroff, Mette Peters, Dalila Pinto, Katherine Pollard, Kerry Ressler, Panos Roussos, Stephan Sanders, Nenad Sestan, Pamela Sklar, Michael P. Snyder, Matthew State, Jason Stein, Patrick Sullivan, Alexander E. Urban, Flora M. Vaccarino, Stephen Warren, Daniel Weinberger, Sherman Weissman, Zhiping Weng, Kevin White, A. Jeremy Willsey, Hyejung Won, and Peter Zandi. Sequencing was partially carried out by the DNA Technologies and Expression Analysis Core at the UC Davis Genome Center, supported by NIH Shared Instrumentation Grant 1S10OD010786-01.

Funding:

This work was funded in part by the National Institute of Mental Health (NIMH) grant numbers U01MH116438 (NA and KSP), R01MH109907 (NA and KSP), R01MH123179 (KSP), UF1MH130700 (TN), R01NS123263 (TN), DP2MH122400 (AAP), New York Stem Cell Foundation (TN, AAP), and R01MH125246 (NA), R56MH114911 (FMV), the National Human Genome Research Institute grant number UM1HG011966 (NA), and Coordination for the Improvement of Higher Education Personnel (CAPES/Brazil) -Finance Code 001.

Footnotes

Competing interests:

NA is the cofounder and on the scientific advisory board of Regel Therapeutics and receives funding from BioMarin Pharmaceutical Incorporated.

Data and materials availability:

The source data described in this manuscript are available via the PsychENCODE Knowledge Portal (https://psychencode.synapse.org/). The PsychENCODE Knowledge Portal is a platform for accessing data, analyses, and tools generated through grants funded by the National Institute of Mental Health (NIMH) PsychENCODE Consortium. Data is available for general research use according to the following requirements for data access and data attribution: (https://psychencode.synapse.org/DataAccess). For access to content described in this manuscript see: DOI will be provided prior to publication.

References

  • 1.Merikangas K. R., He J.-P., Burstein M., Swanson S. A., Avenevoli S., Cui L., Benjet C., Georgiades K., Swendsen J., Lifetime prevalence of mental disorders in U.S. adolescents: results from the National Comorbidity Survey Replication--Adolescent Supplement (NCS-A). J. Am. Acad. Child Adolesc. Psychiatry. 49, 980–989 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Sullivan P. F., Geschwind D. H., Defining the Genetic, Genomic, Cellular, and Diagnostic Architectures of Psychiatric Disorders. Cell. 177, 162–183 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bakken T. E., Miller J. A., Ding S.-L., Sunkin S. M., Smith K. A., Ng L., Szafer A., Dalley R. A., Royall J. J., Lemon T., Shapouri S., Aiona K., Arnold J., Bennett J. L., Bertagnolli D., Bickley K., Boe A., Brouner K., Butler S., Byrnes E., Caldejon S., Carey A., Cate S., Chapin M., Chen J., Dee N., Desta T., Dolbeare T. A., Dotson N., Ebbert A., Fulfs E., Gee G., Gilbert T. L., Goldy J., Gourley L., Gregor B., Gu G., Hall J., Haradon Z., Haynor D. R., Hejazinia N., Hoerder-Suabedissen A., Howard R., Jochim J., Kinnunen M., Kriedberg A., Kuan C. L., Lau C., Lee C.-K., Lee F., Luong L., Mastan N., May R., Melchor J., Mosqueda N., Mott E., Ngo K., Nyhus J., Oldre A., Olson E., Parente J., Parker P. D., Parry S., Pendergraft J., Potekhina L., Reding M., Riley Z. L., Roberts T., Rogers B., Roll K., Rosen D., Sandman D., Sarreal M., Shapovalova N., Shi S., Sjoquist N., Sodt A. J., Townsend R., Velasquez L., Wagley U., Wakeman W. B., White C., Bennett C., Wu J., Young R., Youngstrom B. L., Wohnoutka P., Gibbs R. A., Rogers J., Hohmann J. G., Hawrylycz M. J., Hevner R. F., Molnár Z., Phillips J. W., Dang C., Jones A. R., Amaral D. G., Bernard A., Lein E. S., A comprehensive transcriptional map of primate brain development. Nature. 535, 367–375 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Iossifov I., O’Roak B. J., Sanders S. J., Ronemus M., Krumm N., Levy D., Stessman H. A., Witherspoon K. T., Vives L., Patterson K. E., Smith J. D., Paeper B., Nickerson D. A., Dea J., Dong S., Gonzalez L. E., Mandell J. D., Mane S. M., Murtha M. T., Sullivan C. A., Walker M. F., Waqar Z., Wei L., Willsey A. J., Yamrom B., Lee Y.-H., Grabowska E., Dalkic E., Wang Z., Marks S., Andrews P., Leotta A., Kendall J., Hakker I., Rosenbaum J., Ma B., Rodgers L., Troge J., Narzisi G., Yoon S., Schatz M. C., Ye K., McCombie W. R., Shendure J., Eichler E. E., State M. W., Wigler M., The contribution of de novo coding mutations to autism spectrum disorder. Nature. 515, 216–221 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Satterstrom F. K., Kosmicki J. A., Wang J., Breen M. S., De Rubeis S., An J.-Y., Peng M., Collins R., Grove J., Klei L., Stevens C., Reichert J., Mulhern M. S., Artomov M., Gerges S., Sheppard B., Xu X., Bhaduri A., Norman U., Brand H., Schwartz G., Nguyen R., Guerrero E. E., Dias C., Autism Sequencing Consortium, iPSYCH-Broad Consortium, Betancur C., Cook E. H., Gallagher L., Gill M., Sutcliffe J. S., Thurm A., Zwick M. E., Børglum A. D., State M. W., Cicek A. E., Talkowski M. E., Cutler D. J., Devlin B., Sanders S. J., Roeder K., Daly M. J., Buxbaum J. D., Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell. 180, 568–584.e23 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Clifton N. E., Hannon E., Harwood J. C., Di Florio A., Thomas K. L., Holmans P. A., Walters J. T. R., O’Donovan M. C., Owen M. J., Pocklington A. J., Hall J., Dynamic expression of genes associated with schizophrenia and bipolar disorder across development. Transl. Psychiatry. 9, 74 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cross-Disorder Group of the Psychiatric Genomics Consortium, Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet. 381, 1371–1379 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cross-Disorder Group of the Psychiatric Genomics Consortium, Genomic Relationships, Novel Loci, and Pleiotropic Mechanisms across Eight Psychiatric Disorders. Cell. 179, 1469–1482.e11 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Gandal M. J., Zhang P., Hadjimichael E., Walker R. L., Chen C., Liu S., Won H., van Bakel H., Varghese M., Wang Y., Shieh A. W., Haney J., Parhami S., Belmont J., Kim M., Moran Losada P., Khan Z., Mleczko J., Xia Y., Dai R., Wang D., Yang Y. T., Xu M., Fish K., Hof P. R., Warrell J., Fitzgerald D., White K., Jaffe A. E., PsychENCODE Consortium, Peters M. A., Gerstein M., Liu C., Iakoucheva L. M., Pinto D., Geschwind D. H., Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science. 362 (2018), doi: 10.1126/science.aat8127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chatterjee S., Ahituv N., Gene Regulatory Elements, Major Drivers of Human Disease. Annu. Rev. Genomics Hum. Genet. 18, 45–63 (2017). [DOI] [PubMed] [Google Scholar]
  • 11.Ziffra R. S., Kim C. N., Ross J. M., Wilfert A., Turner T. N., Haeussler M., Casella A. M., Przytycki P. F., Keough K. C., Shin D., Bogdanoff D., Kreimer A., Pollard K. S., Ament S. A., Eichler E. E., Ahituv N., Nowakowski T. J., Single-cell epigenomics reveals mechanisms of human cortical development. Nature. 598, 205–213 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Trevino A. E., Müller F., Andersen J., Sundaram L., Kathiria A., Shcherbina A., Farh K., Chang H. Y., Pașca A. M., Kundaje A., Pașca S. P., Greenleaf W. J., Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell. 184, 5053–5069.e23 (2021). [DOI] [PubMed] [Google Scholar]
  • 13.Inoue F., Ahituv N., Decoding enhancers using massively parallel reporter assays. Genomics. 106, 159–164 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tewhey R., Kotliar D., Park D. S., Liu B., Winnicki S., Reilly S. K., Andersen K. G., Mikkelsen T. S., Lander E. S., Schaffner S. F., Sabeti P. C., Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell. 165, 1519–1529 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Abell N. S., DeGorter M. K., Gloudemans M. J., Greenwald E., Smith K. S., He Z., Montgomery S. B., Multiple causal variants underlie genetic associations in humans. Science. 375, 1247–1254 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Whalen S., Inoue F., Ryu H., Fair T., Markenscoff-Papadimitriou E., Keough K., Kircher M., Martin B., Alvarado B., Elor O., Laboy Cintron D., Williams A., Hassan Samee M. A., Thomas S., Krencik R., Ullian E. M., Kriegstein A., Rubenstein J. L., Shendure J., Pollen A. A., Ahituv N., Pollard K. S., Machine learning dissection of human accelerated regions in primate neurodevelopment. Neuron (2023), doi: 10.1016/j.neuron.2022.12.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Taskiran I. I., Spanier K. I., Christiaens V., Mauduit D., Aerts S., Cell type directed design of synthetic enhancers. bioRxiv (2022), p. 2022.07.26.501466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.de Almeida B. P., Reiter F., Pagani M., Stark A., DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022). [DOI] [PubMed] [Google Scholar]
  • 19.Shlyueva D., Stampfel G., Stark A., Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014). [DOI] [PubMed] [Google Scholar]
  • 20.Lu F., Sossin A., Abell N., Montgomery S. B., He Z., Deep learning-assisted genome-wide characterization of massively parallel reporter assays. Nucleic Acids Res. 50, 11442–11454 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Vaishnav E. D., de Boer C. G., Molinet J., Yassour M., Fan L., Adiconis X., Thompson D. A., Levin J. Z., Cubillos F. A., Regev A., The evolution, evolvability and engineering of gene regulatory DNA. Nature. 603, 455–463 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhou J., Theesfeld C. L., Yao K., Chen K. M., Wong A. K., Troyanskaya O. G., Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Avsec Ž., Agarwal V., Visentin D., Ledsam J. R., Grabska-Barwinska A., Taylor K. R., Assael Y., Jumper J., Kohli P., Kelley D. R., Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods. 18, 1196–1203 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Fudenberg G., Kelley D. R., Pollard K. S., Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods. 17, 1111–1117 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Zhou J., Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nat. Genet. 54, 725–734 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kelley D. R., Snoek J., Rinn J. L., Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yin Q., Wu M., Liu Q., Lv H., Jiang R., DeepHistone: a deep learning approach to predicting histone modifications. BMC Genomics. 20, 193 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhou J., Troyanskaya O. G., Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods. 12, 931–934 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Chen K. M., Wong A. K., Troyanskaya O. G., Zhou J., A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Chen C., Hou J., Shi X., Yang H., Birchler J. A., Cheng J., DeepGRN: prediction of transcription factor binding site across cell-types using attention-based deep neural networks. BMC Bioinformatics. 22, 38 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Markenscoff-Papadimitriou E., Whalen S., Przytycki P., Thomas R., Binyameen F., Nowakowski T. J., Kriegstein A. R., Sanders S. J., State M. W., Pollard K. S., Rubenstein J. L., A Chromatin Accessibility Atlas of the Developing Human Telencephalon. Cell. 182, 754–769.e18 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Song M., Pebworth M.-P., Yang X., Abnousi A., Fan C., Wen J., Rosen J. D., Choudhary M. N. K., Cui X., Jones I. R., Bergenholtz S., Eze U. C., Juric I., Li B., Maliskova L., Lee J., Liu W., Pollen A. A., Li Y., Wang T., Hu M., Kriegstein A. R., Shen Y., Cell-type-specific 3D epigenomes in the developing human cortex. Nature. 587, 644–649 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Song M., Yang X., Ren X., Maliskova L., Li B., Jones I. R., Wang C., Jacob F., Wu K., Traglia M., Tam T. W., Jamieson K., Lu S.-Y., Ming G.-L., Li Y., Yao J., Weiss L. A., Dixon J. R., Judge L. M., Conklin B. R., Song H., Gan L., Shen Y., Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet. 51, 1252–1262 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Wang D., Liu S., Warrell J., Won H., Shi X., Navarro F. C. P., Clarke D., Gu M., Emani P., Yang Y. T., Xu M., Gandal M. J., Lou S., Zhang J., Park J. J., Yan C., Rhie S. K., Manakongtreecheep K., Zhou H., Nathan A., Peters M., Mattei E., Fitzgerald D., Brunetti T., Moore J., Jiang Y., Girdhar K., Hoffman G. E., Kalayci S., Gümüş Z. H., Crawford G. E., PsychENCODE Consortium, Roussos P., Akbarian S., Jaffe A. E., White K. P., Weng Z., Sestan N., Geschwind D. H., Knowles J. A., Gerstein M. B., Comprehensive functional genomic resource and integrative model for the human brain. Science. 362 (2018), doi: 10.1126/science.aat8464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Liang D., Elwell A. L., Aygün N., Krupa O., Wolter J. M., Kyere F. A., Lafferty M. J., Cheek K. E., Courtney K. P., Yusupova M., Garrett M. E., Ashley-Koch A., Crawford G. E., Love M. I., de la Torre-Ubieta L., Geschwind D. H., Stein J. L., Cell-type-specific effects of genetic variation on chromatin accessibility during human neuronal differentiation. Nat. Neurosci. 24, 941–953 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Werling D. M., Pochareddy S., Choi J., An J.-Y., Sheppard B., Peng M., Li Z., Dastmalchi C., Santpere G., Sousa A. M. M., Tebbenkamp A. T. N., Kaur N., Gulden F. O., Breen M. S., Liang L., Gilson M. C., Zhao X., Dong S., Klei L., Cicek A. E., Buxbaum J. D., Adle-Biassette H., Thomas J.-L., Aldinger K. A., O’Day D. R., Glass I. A., Zaitlen N. A., Talkowski M. E., Roeder K., State M. W., Devlin B., Sanders S. J., Sestan N., Whole-Genome and RNA Sequencing Reveal Variation and Transcriptomic Coordination in the Developing Human Prefrontal Cortex. Cell Rep. 31, 107489 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Demontis D., Walters R. K., Martin J., Mattheisen M., Als T. D., Agerbo E., Baldursson G., Belliveau R., Bybjerg-Grauholm J., Bækvad-Hansen M., Cerrato F., Chambert K., Churchhouse C., Dumont A., Eriksson N., Gandal M., Goldstein J. I., Grasby K. L., Grove J., Gudmundsson O. O., Hansen C. S., Hauberg M. E., Hollegaard M. V., Howrigan D. P., Huang H., Maller J. B., Martin A. R., Martin N. G., Moran J., Pallesen J., Palmer D. S., Pedersen C. B., Pedersen M. G., Poterba T., Poulsen J. B., Ripke S., Robinson E. B., Satterstrom F. K., Stefansson H., Stevens C., Turley P., Walters G. B., Won H., Wright M. J., ADHD Working Group of the Psychiatric Genomics Consortium (PGC), Early Lifecourse & Genetic Epidemiology (EAGLE) Consortium, 23andMe Research Team, Andreassen O. A., Asherson P., Burton C. L., Boomsma D. I., Cormand B., Dalsgaard S., Franke B., Gelernter J., Geschwind D., Hakonarson H., Haavik J., Kranzler H. R., Kuntsi J., Langley K., Lesch K.-P., Middeldorp C., Reif A., Rohde L. A., Roussos P., Schachar R., Sklar P., Sonuga-Barke E. J. S., Sullivan P. F., Thapar A., Tung J. Y., Waldman I. D., Medland S. E., Stefansson K., Nordentoft M., Hougaard D. M., Werge T., Mors O., Mortensen P. B., Daly M. J., Faraone S. V., Børglum A. D., Neale B. M., Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jansen I. E., Savage J. E., Watanabe K., Bryois J., Williams D. M., Steinberg S., Sealock J., Karlsson I. K., Hägg S., Athanasiu L., Voyle N., Proitsi P., Witoelar A., Stringer S., Aarsland D., Almdahl I. S., Andersen F., Bergh S., Bettella F., Bjornsson S., Brækhus A., Bråthen G., de Leeuw C., Desikan R. S., Djurovic S., Dumitrescu L., Fladby T., Hohman T. J., Jonsson P. V., Kiddle S. J., Rongve A., Saltvedt I., Sando S. B., Selbæk G., Shoai M., Skene N. G., Snaedal J., Stordal E., Ulstein I. D., Wang Y., White L. R., Hardy J., Hjerling-Leffler J., Sullivan P. F., van der Flier W. M., Dobson R., Davis L. K., Stefansson H., Stefansson K., Pedersen N. L., Ripke S., Andreassen O. A., Posthuma D., Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Autism Spectrum Disorders Working Group of The Psychiatric Genomics Consortium, Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism. 8, 21 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Mullins N., Forstner A. J., O’Connell K. S., Coombes B., Coleman J. R. I., Qiao Z., Als T. D., Bigdeli T. B., Børte S., Bryois J., Charney A. W., Drange O. K., Gandal M. J., Hagenaars S. P., Ikeda M., Kamitaki N., Kim M., Krebs K., Panagiotaropoulou G., Schilder B. M., Sloofman L. G., Steinberg S., Trubetskoy V., Winsvold B. S., Won H.-H., Abramova L., Adorjan K., Agerbo E., Al Eissa M., Albani D., Alliey-Rodriguez N., Anjorin A., Antilla V., Antoniou A., Awasthi S., Baek J. H., Bækvad-Hansen M., Bass N., Bauer M., Beins E. C., Bergen S. E., Birner A., Bøcker Pedersen C., Bøen E., Boks M. P., Bosch R., Brum M., Brumpton B. M., Brunkhorst-Kanaan N., Budde M., Bybjerg-Grauholm J., Byerley W., Cairns M., Casas M., Cervantes P., Clarke T.-K., Cruceanu C., Cuellar-Barboza A., Cunningham J., Curtis D., Czerski P. M., Dale A. M., Dalkner N., David F. S., Degenhardt F., Djurovic S., Dobbyn A. L., Douzenis A., Elvsåshagen T., Escott-Price V., Ferrier I. N., Fiorentino A., Foroud T. M., Forty L., Frank J., Frei O., Freimer N. B., Frisén L., Gade K., Garnham J., Gelernter J., Giørtz Pedersen M., Gizer I. R., Gordon S. D., Gordon-Smith K., Greenwood T. A., Grove J., Guzman-Parra J., Ha K., Haraldsson M., Hautzinger M., Heilbronner U., Hellgren D., Herms S., Hoffmann P., Holmans P. A., Huckins L., Jamain S., Johnson J. S., Kalman J. L., Kamatani Y., Kennedy J. L., Kittel-Schneider S., Knowles J. A., Kogevinas M., Koromina M., Kranz T. M., Kranzler H. R., Kubo M., Kupka R., Kushner S. A., Lavebratt C., Lawrence J., Leber M., Lee H.-J., Lee P. H., Levy S. E., Lewis C., Liao C., Lucae S., Lundberg M., MacIntyre D. J., Magnusson S. H., Maier W., Maihofer A., Malaspina D., Maratou E., Martinsson L., Mattheisen M., McCarroll S. A., McGregor N. W., McGuffin P., McKay J. D., Medeiros H., Medland S. E., Millischer V., Montgomery G. W., Moran J. L., Morris D. W., Mühleisen T. W., O’Brien N., O’Donovan C., Olde Loohuis L. M., Oruc L., Papiol S., Pardiñas A. F., Perry A., Pfennig A., Porichi E., Potash J. B., Quested D., Raj T., Rapaport M. H., DePaulo J. R., Regeer E. J., Rice J. P., Rivas F., Rivera M., Roth J., Roussos P., Ruderfer D. M., Sánchez-Mora C., Schulte E. C., Senner F., Sharp S., Shilling P. D., Sigurdsson E., Sirignano L., Slaney C., Smeland O. B., Smith D. J., Sobell J. L., Søholm Hansen C., Soler Artigas M., Spijker A. T., Stein D. J., Strauss J. S., Świątkowska B., Terao C., Thorgeirsson T. E., Toma C., Tooney P., Tsermpini E.-E., Vawter M. P., Vedder H., Walters J. T. R., Witt S. H., Xi S., Xu W., Yang J. M. K., Young A. H., Young H., Zandi P. P., Zhou H., Zillich L., HUNT All-In Psychiatry, Adolfsson R., Agartz I., Alda M., Alfredsson L., Babadjanova G., Backlund L., Baune B. T., Bellivier F., Bengesser S., Berrettini W. H., Blackwood D. H. R., Boehnke M., Børglum A. D., Breen G., Carr V. J., Catts S., Corvin A., Craddock N., Dannlowski U., Dikeos D., Esko T., Etain B., Ferentinos P., Frye M., Fullerton J. M., Gawlik M., Gershon E. S., Goes F. S., Green M. J., Grigoroiu-Serbanescu M., Hauser J., Henskens F., Hillert J., Hong K. S., Hougaard D. M., Hultman C. M., Hveem K., Iwata N., Jablensky A. V., Jones I., Jones L. A., Kahn R. S., Kelsoe J. R., Kirov G., Landén M., Leboyer M., Lewis C. M., Li Q. S., Lissowska J., Lochner C., Loughland C., Martin N. G., Mathews C. A., Mayoral F., McElroy S. L., McIntosh A. M., McMahon F. J., Melle I., Michie P., Milani L., Mitchell P. B., Morken G., Mors O., Mortensen P. B., Mowry B., Müller-Myhsok B., Myers R. M., Neale B. M., Nievergelt C. M., Nordentoft M., Nöthen M. M., O’Donovan M. C., Oedegaard K. J., Olsson T., Owen M. J., Paciga S. A., Pantelis C., Pato C., Pato M. T., Patrinos G. P., Perlis R. H., Posthuma D., Ramos-Quiroga J. A., Reif A., Reininghaus E. Z., Ribasés M., Rietschel M., Ripke S., Rouleau G. A., Saito T., Schall U., Schalling M., Schofield P. R., Schulze T. G., Scott L. J., Scott R. J., Serretti A., Shannon Weickert C., Smoller J. W., Stefansson H., Stefansson K., Stordal E., Streit F., Sullivan P. F., Turecki G., Vaaler A. E., Vieta E., Vincent J. B., Waldman I. D., Weickert T. W., Werge T., Wray N. R., Zwart J.-A., Biernacka J. M., Nurnberger J. I., Cichon S., Edenberg H. J., Stahl E. A., McQuillin A., Di Florio A., Ophoff R. A., Andreassen O. A., Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet. 53, 817–829 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium, Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.International Obsessive Compulsive Disorder Foundation Genetics Collaborative and OCD Collaborative Genetics Association Studies, Revealing the complex genetic architecture of obsessive-compulsive disorder using meta-analysis. Mol. Psychiatry. 23, 1181–1188 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Schizophrenia Working Group of the Psychiatric Genomics Consortium, Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature. 604, 502–508 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Tourette Association of America International Consortium for Genetics, the Gilles de la Tourette GWAS Replication Initiative, the Tourette International Collaborative Genetics Study, and the Psychiatric Genomics Consortium Tourette Syndrome Working Group, Interrogating the Genetic Determinants of Tourette’s Syndrome and Other Tic Disorders Through Genome-Wide Association Studies. Am. J. Psychiatry. 176, 217–227 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Mostafavi H., Spence J. P., Naqvi S., Pritchard J. K., Limited overlap of eQTLs and GWAS hits due to systematic differences in discovery. bioRxiv (2022), p. 2022.05.07.491045. [Google Scholar]
  • 46.Amiri A., Coppola G., Scuderi S., Wu F., Roychowdhury T., Liu F., Pochareddy S., Shin Y., Safi A., Song L., Zhu Y., Sousa A. M. M., PsychENCODE Consortium, Gerstein M., Crawford G. E., Sestan N., Abyzov A., Vaccarino F. M., Transcriptome and epigenome landscape of human cortical development modeled in organoids. Science. 362 (2018), doi: 10.1126/science.aat6720. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zhao Y., Vartak S. V., Conte A., Wang X., Garcia D. A., Stevens E., Kyoung Jung S., Kieffer-Kwon K.-R., Vian L., Stodola T., Moris F., Chopp L., Preite S., Schwartzberg P. L., Kulinski J. M., Olivera A., Harly C., Bhandoola A., Heuston E. F., Bodine D. M., Urrutia R., Upadhyaya A., Weirauch M. T., Hager G., Casellas R., “Stripe” transcription factors provide accessibility to co-binding partners in mammalian genomes. Mol. Cell (2022), doi: 10.1016/j.molcel.2022.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Blow M. J., McCulley D. J., Li Z., Zhang T., Akiyama J. A., Holt A., Plajzer-Frick I., Shoukry M., Wright C., Chen F., Afzal V., Bristow J., Ren B., Black B. L., Rubin E. M., Visel A., Pennacchio L. A., ChIP-Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806–810 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Bandler R. C., Mayer C., Fishell G., Cortical interneuron specification: the juncture of genes, time and geometry. Curr. Opin. Neurobiol. 42, 17–24 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Weiss C. V., Harshman L., Inoue F., Fraser H. B., Petrov D. A., Ahituv N., Gokhman D., The cis-regulatory effects of modern human-specific variants. Elife. 10 (2021), doi: 10.7554/eLife.63713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Cooper Y. A., Teyssier N., Dräger N. M., Guo Q., Davis J. E., Sattler S. M., Yang Z., Patel A., Wu S., Kosuri S., Coppola G., Kampmann M., Geschwind D. H., Functional regulatory variants implicate distinct transcriptional networks in dementia. Science. 377, eabi8654 (2022). [DOI] [PubMed] [Google Scholar]
  • 52.GTEx Consortium, Laboratory, Data Analysis &Coordinating Center (LDACC)—Analysis Working Group, Statistical Methods groups—Analysis Working Group, Enhancing GTEx (eGTEx) groups, NIH Common Fund, NIH/NCI, NIH/NHGRI, NIH/NIMH, NIH/NIDA, Biospecimen Collection Source Site—NDRI, Biospecimen Collection Source Site—RPCI, Biospecimen Core Resource—VARI, Brain Bank Repository—University of Miami Brain Endowment Bank, Leidos Biomedical—Project Management, ELSI Study, Genome Browser Data Integration &Visualization—EBI, Genome Browser Data Integration &Visualization—UCSC Genomics Institute, University of California Santa Cruz, Lead analysts:, Laboratory, Data Analysis &Coordinating Center (LDACC):, NIH program management:, Biospecimen collection:, Pathology:, eQTL manuscript working group:, Battle A., Brown C. D., Engelhardt B. E., Montgomery S. B., Genetic effects on gene expression across human tissues. Nature. 550, 204–213 (2017).29022597 [Google Scholar]
  • 53.Kircher M., Xiong C., Martin B., Schubach M., Inoue F., Bell R. J. A., Costello J. F., Shendure J., Ahituv N., Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Coetzee S. G., Coetzee G. A., Hazelett D. J., motifbreakR: an R/Bioconductor package for predicting variant effects at transcription factor binding sites. Bioinformatics. 31, 3847–3849 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Szklarczyk D., Gable A. L., Lyon D., Junge A., Wyder S., Huerta-Cepas J., Simonovic M., Doncheva N. T., Morris J. H., Bork P., Jensen L. J., von Mering C., STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Nott A., Holtman I. R., Coufal N. G., Schlachetzki J. C. M., Yu M., Hu R., Han C. Z., Pena M., Xiao J., Wu Y., Keulen Z., Pasillas M. P., O’Connor C., Nickl C. K., Schafer S. T., Shen Z., Rissman R. A., Brewer J. B., Gosselin D., Gonda D. D., Levy M. L., Rosenfeld M. G., McVicker G., Gage F. H., Ren B., Glass C. K., Brain cell type-specific enhancer-promoter interactome maps and disease-risk association. Science. 366, 1134–1139 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.McDermott J. H., Study D. D. D., Clayton-Smith J., Briggs T. A., The TBR1-related autistic-spectrum-disorder phenotype and its clinical spectrum. Eur. J. Med. Genet. 61, 253–256 (2018). [DOI] [PubMed] [Google Scholar]
  • 58.Zhou X., Feliciano P., Shu C., Wang T., Astrovskaya I., Hall J. B., Obiajulu J. U., Wright J. R., Murali S. C., Xu S. X., Brueggeman L., Thomas T. R., Marchenko O., Fleisch C., Barns S. D., Snyder L. G., Han B., Chang T. S., Turner T. N., Harvey W. T., Nishida A., O’Roak B. J., Geschwind D. H., SPARK Consortium, Michaelson J. J., Volfovsky N., Eichler E. E., Shen Y., Chung W. K., Integrating de novo and inherited variants in 42,607 autism cases identifies mutations in new moderate-risk genes. Nat. Genet. 54, 1305–1319 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Volk D. W., Moroco A. E., Roman K. M., Edelson J. R., Lewis D. A., The Role of the Nuclear Factor-κB Transcriptional Complex in Cortical Immune Activation in Schizophrenia. Biol. Psychiatry. 85, 25–34 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Wang J., Liu J., Li S., Li X., Yang J., Dang X., Mu C., Li Y., Li K., Li J., Chen R., Liu Y., Huang D., Zhang Z., Luo X.-J., Genetic regulatory and biological implications of the 10q24.32 schizophrenia risk locus. Brain (2022), doi: 10.1093/brain/awac352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Baudry M., Yao Y., Simmons D., Liu J., Bi X., Postnatal development of inflammation in a murine model of Niemann-Pick type C disease: immunohistochemical observations of microglia and astroglia. Exp. Neurol. 184, 887–903 (2003). [DOI] [PubMed] [Google Scholar]
  • 62.Zhang Y., Lu T., Yan H., Ruan Y., Wang L., Zhang D., Yue W., Lu L., Replication of association between schizophrenia and chromosome 6p21–6p22.1 polymorphisms in Chinese Han population. PLoS One. 8, e56732 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Chai G., Webb A., Li C., Antaki D., Lee S., Breuss M. W., Lang N., Stanley V., Anzenberg P., Yang X., Marshall T., Gaffney P., Wierenga K. J., Chung B. H.-Y., Tsang M. H.-Y., Pais L. S., Lovgren A. K., VanNoy G. E., Rehm H. L., Mirzaa G., Leon E., Diaz J., Neumann A., Kalverda A. P., Manfield I. W., Parry D. A., Logan C. V., Johnson C. A., Bonthron D. T., Valleley E. M. A., Issa M. Y., Abdel-Ghafar S. F., Abdel-Hamid M. S., Jennings P., Zaki M. S., Sheridan E., Gleeson J. G., Mutations in Spliceosomal Genes PPIL1 and PRP17 Cause Neurodegenerative Pontocerebellar Hypoplasia with Microcephaly. Neuron. 109, 241–256.e9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Shi J., Levinson D. F., Duan J., Sanders A. R., Zheng Y., Pe’er I., Dudbridge F., Holmans P. A., Whittemore A. S., Mowry B. J., Olincy A., Amin F., Cloninger C. R., Silverman J. M., Buccola N. G., Byerley W. F., Black D. W., Crowe R. R., Oksenberg J. R., Mirel D. B., Kendler K. S., Freedman R., Gejman P. V., Common variants on chromosome 6p22.1 are associated with schizophrenia. Nature. 460, 753–757 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Rampino A., Annese T., Torretta S., Tamma R., Falcone R. M., Ribatti D., Involvement of vascular endothelial growth factor in schizophrenia. Neurosci. Lett. 760, 136093 (2021). [DOI] [PubMed] [Google Scholar]
  • 66.Sanders S. J., He X., Willsey A. J., Ercan-Sencicek A. G., Samocha K. E., Cicek A. E., Murtha M. T., Bal V. H., Bishop S. L., Dong S., Goldberg A. P., Jinlu C., Keaney J. F. 3rd, Klei L., Mandell J. D., Moreno-De-Luca D., Poultney C. S., Robinson E. B., Smith L., Solli-Nowlan T., Su M. Y., Teran N. A., Walker M. F., Werling D. M., Beaudet A. L., Cantor R. M., Fombonne E., Geschwind D. H., Grice D. E., Lord C., Lowe J. K., Mane S. M., Martin D. M., Morrow E. M., Talkowski M. E., Sutcliffe J. S., Walsh C. A., Yu T. W., Autism Sequencing Consortium, Ledbetter D. H., Martin C. L., Cook E. H., Buxbaum J. D., Daly M. J., Devlin B., Roeder K., State M. W., Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci. Neuron. 87, 1215–1233 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Pollen A. A., Bhaduri A., Andrews M. G., Nowakowski T. J., Meyerson O. S., Mostajo-Radji M. A., Di Lullo E., Alvarado B., Bedolli M., Dougherty M. L., Fiddes I. T., Kronenberg Z. N., Shuga J., Leyrat A. A., West J. A., Bershteyn M., Lowe C. B., Pavlovic B. J., Salama S. R., Haussler D., Eichler E. E., Kriegstein A. R., Establishing Cerebral Organoids as Models of Human-Specific Brain Evolution. Cell. 176, 743–756.e17 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Di Lullo E., Kriegstein A. R., The use of brain organoids to investigate neural development and disease. Nat. Rev. Neurosci. 18, 573–584 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Fiorenzano A., Sozzi E., Birtele M., Kajtez J., Giacomoni J., Nilsson F., Bruzelius A., Sharma Y., Zhang Y., Mattsson B., Emnéus J., Ottosson D. R., Storm P., Parmar M., Single-cell transcriptomics captures features of human midbrain development and dopamine neuron diversity in brain organoids. Nat. Commun. 12, 7302 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Qian X., Su Y., Adam C. D., Deutschmann A. U., Pather S. R., Goldberg E. M., Su K., Li S., Lu L., Jacob F., Nguyen P. T. T., Huh S., Hoke A., Swinford-Jackson S. E., Wen Z., Gu X., Pierce R. C., Wu H., Briand L. A., Chen H. I., Wolf J. A., Song H., Ming G.-L., Sliced Human Cortical Organoids for Modeling Distinct Cortical Layer Formation. Cell Stem Cell. 26, 766–781.e9 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kurosu T., Fukuda T., Miki T., Miura O., BCL6 overexpression prevents increase in reactive oxygen species and inhibits apoptosis induced by chemotherapeutic reagents in B-cell lymphoma cells. Oncogene. 22, 4459–4468 (2003). [DOI] [PubMed] [Google Scholar]
  • 72.Bhaduri A., Andrews M. G., Mancia Leon W., Jung D., Shin D., Allen D., Jung D., Schmunk G., Haeussler M., Salma J., Pollen A. A., Nowakowski T. J., Kriegstein A. R., Cell stress in cortical organoids impairs molecular subtype specification. Nature. 578, 142–148 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Szklarczyk D., Gable A. L., Nastou K. C., Lyon D., Kirsch R., Pyysalo S., Doncheva N. T., Legeay M., Fang T., Bork P., Jensen L. J., von Mering C., The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49, D605–D612 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Visel A., Minovitsky S., Dubchak I., Pennacchio L. A., VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–92 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Movva R., Greenside P., Marinov G. K., Nair S., Shrikumar A., Kundaje A., Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS One. 14, e0218073 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Kulakovskiy I. V., Vorontsov I. E., Yevshin I. S., Sharipov R. N., Fedorova A. D., Rumynskiy E. I., Medvedeva Y. A., Magana-Mora A., Bajic V. B., Papatsenko D. A., Kolpakov F. A., Makeev V. J., HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 46, D252–D259 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Liu Y.-R., Laghari Z. A., Novoa C. A., Hughes J., Webster J. R. M., Goodwin P. E., Wheatley S. P., Scotting P. J., Sox2 acts as a transcriptional repressor in neural stem cells. BMC Neurosci. 15, 95 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Desclozeaux M., Poulat F., de Santa Barbara P., Capony J. P., Turowski P., Jay P., Méjean C., Moniot B., Boizet B., Berta P., Phosphorylation of an N-terminal motif enhances DNA-binding activity of the human SRY protein. J. Biol. Chem. 273, 7988–7995 (1998). [DOI] [PubMed] [Google Scholar]
  • 79.Maes M. E., Colombo G., Schulz R., Siegert S., Targeting microglia with lentivirus and AAV: Recent advances and remaining challenges. Neurosci. Lett. 707, 134310 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Bienz M., TCF: transcriptional activator or repressor? Curr. Opin. Cell Biol. 10, 366–372 (1998). [DOI] [PubMed] [Google Scholar]
  • 81.Westendorf J. J., Transcriptional co-repressors of Runx2. J. Cell. Biochem. 98, 54–64 (2006). [DOI] [PubMed] [Google Scholar]
  • 82.Kim S., Yu N.-K., Kaang B.-K., CTCF as a multifunctional protein in genome regulation and gene expression. Exp. Mol. Med. 47, e166 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Choi J., Zhang T., Vu A., Ablain J., Makowski M. M., Colli L. M., Xu M., Hennessey R. C., Yin J., Rothschild H., Gräwe C., Kovacs M. A., Funderburk K. M., Brossard M., Taylor J., Pasaniuc B., Chari R., Chanock S. J., Hoggart C. J., Demenais F., Barrett J. H., Law M. H., Iles M. M., Yu K., Vermeulen M., Zon L. I., Brown K. M., Massively parallel reporter assays of melanoma risk variants identify MX2 as a gene promoting melanoma. Nat. Commun. 11, 2718 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.McAfee J. C., Lee S., Lee J., Bell J. L., Krupa O., Davis J., Insigne K., Bond M. L., Phanstiel D. H., Love M. I., Stein J. L., Kosuri S., Won H., Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. bioRxiv (2022), doi: 10.1101/2022.09.15.22279954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Hamilton W. B., Mosesson Y., Monteiro R. S., Emdal K. B., Knudsen T. E., Francavilla C., Barkai N., Olsen J. V., Brickman J. M., Dynamic lineage priming is driven via direct enhancer regulation by ERK. Nature. 575, 355–360 (2019). [DOI] [PubMed] [Google Scholar]
  • 86.Zhao S., Hong C. K. Y., Myers C. A., Granas D. M., White M. A., Corbo J. C., Cohen B. A., A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat. Genet., 1–9 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Lalanne J.-B., Regalado S. G., Domcke S., Calderon D., Martin B., Li T., Suiter C. C., Lee C., Trapnell C., Shendure J., Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters. bioRxiv (2022), p. 2022.12.10.519236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Abugessaisa I., Noguchi S., Hasegawa A., Kondo A., Kawaji H., Carninci P., Kasukawa T., refTSS: A Reference Data Set for Human and Mouse Transcription Start Sites. J. Mol. Biol. 431, 2407–2422 (2019). [DOI] [PubMed] [Google Scholar]
  • 89.Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A. R., Bender D., Maller J., Sklar P., de Bakker P. I. W., Daly M. J., Sham P. C., PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Sherry S. T., Ward M. H., Kholodov M., Baker J., Phan L., Smigielski E. M., Sirotkin K., dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Hubisz M. J., Pollard K. S., Exploring the genesis and functions of Human Accelerated Regions sheds light on their role in human evolution. Curr. Opin. Genet. Dev. 29, 15–21 (2014). [DOI] [PubMed] [Google Scholar]
  • 92.Keough K. C., Whalen S., Inoue F., Przytycki P. F., Fair T., Deng C., Steyert M., Ryu H., Lindblad-Toh K., Karlsson E., Zoonomia Consortium, Nowakowski T., Ahituv N., Pollen A., Pollard K. S., Three-dimensional genome re-wiring in loci with Human Accelerated Regions. bioRxiv (2022), p. 2022.10.04.510859. [Google Scholar]
  • 93.Sloan S. A., Andersen J., Pașca A. M., Birey F., Pașca S. P., Generation and assembly of human brain region-specific three-dimensional cultures. Nat. Protoc. 13, 2062–2085 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Chen Y., Tristan C. A., Chen L., Jovanovic V. M., Malley C., Chu P.-H., Ryu S., Deng T., Ormanoglu P., Tao D., Fang Y., Slamecka J., Hong H., LeClair C. A., Michael S., Austin C. P., Simeonov A., Singeç I., A versatile polypharmacology platform promotes cytoprotection and viability of human pluripotent and differentiated cells. Nat. Methods. 18, 528–541 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Gordon M. G., Inoue F., Martin B., Schubach M., Agarwal V., Whalen S., Feng S., Zhao J., Ashuach T., Ziffra R., Kreimer A., Georgakopoulos-Soares I., Yosef N., Ye C. J., Pollard K. S., Shendure J., Kircher M., Ahituv N., lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements. Nat. Protoc. 15, 2387–2412 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Gaspar J. M., NGmerge: merging paired-end reads via novel empirically-derived models of sequencing errors. BMC Bioinformatics. 19, 536 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Li H., Durbin R., Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Ritchie M. E., Phipson B., Wu D., Hu Y., Law C. W., Shi W., Smyth G. K., limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Heinz S., Benner C., Spann N., Bertolino E., Lin Y. C., Laslo P., Cheng J. X., Murre C., Singh H., Glass C. K., Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell. 38, 576–589 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Reimand J., Kull M., Peterson H., Hansen J., Vilo J., g:Profiler--a web-based toolset for functional profiling of gene lists from large-scale experiments. Nucleic Acids Res. 35, W193–200 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101.Quinlan A. R., Hall I. M., BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Smedley D., Haider S., Ballester B., Holland R., London D., Thorisson G., Kasprzyk A., BioMart--biological queries made easy. BMC Genomics. 10, 22 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Abadi M., Barham P., Chen J., Chen Z., Davis A., Dean J., Devin M., Ghemawat S., Irving G., Isard M., Others, “TensorFlow: A System for Large-Scale Machine Learning” in 12th USENIX symposium on operating systems design and implementation (OSDI 16) (2016), pp. 265–283. [Google Scholar]
  • 104.Bray N. L., Pimentel H., Melsted P., Pachter L., Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016). [DOI] [PubMed] [Google Scholar]
  • 105.Soneson C., Love M. I., Robinson M. D., Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res. 4, 1521 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Love M. I., Huber W., Anders S., Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Mesman S., Bakker R., Smidt M. P., Tcf4 is required for correct brain development during embryogenesis. Mol. Cell. Neurosci. 106, 103502 (2020). [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.pdf (2.9MB, pdf)

Data Availability Statement

The source data described in this manuscript are available via the PsychENCODE Knowledge Portal (https://psychencode.synapse.org/). The PsychENCODE Knowledge Portal is a platform for accessing data, analyses, and tools generated through grants funded by the National Institute of Mental Health (NIMH) PsychENCODE Consortium. Data is available for general research use according to the following requirements for data access and data attribution: (https://psychencode.synapse.org/DataAccess). For access to content described in this manuscript see: DOI will be provided prior to publication.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES