Abstract
Regulation of gene expression through enhancers is one of the major processes shaping the structure and function of the human brain during development. High-throughput assays have predicted thousands of enhancers involved in neurodevelopment, and confirming their activity through orthogonal functional assays is crucial. Here, we utilized Massively Parallel Reporter Assays (MPRAs) in stem cells and forebrain organoids to evaluate the activity of ~7,000 gene-linked enhancers previously identified in human fetal tissues and brain organoids. We used a Gaussian mixture model to evaluate the contribution of background noise in the measured activity signal to confirm the activity of ~35% of the tested enhancers, with most showing temporal-specific activity, suggesting their evolving role in neurodevelopment. The temporal specificity was further supported by the correlation of activity with gene expression. Our findings provide a valuable gene regulatory resource to the scientific community.
AUTHOR SUMMARY
Enhancers are non-coding elements that play a crucial role in the regulation of gene expression during brain development. Despite the availability of various techniques available to identify enhancers, their functional activity is relatively less understood, leaving a gap in our understanding of how enhancer behavior might regulate complex transitions of neurodevelopment. To address this, we utilized forebrain organoids, a 3D model system which closely mimics the complex cellular environment of the developing human brain, and employed Massively Parallel Reporter Assay (MPRA) to validate enhancer activity at various stages of forebrain differentiation, from induced pluripotent stem cells (iPSCs) to neuronal progenitors and cortical neurons. Our study provides a comprehensive catalog of over 2,300 enhancers, showcasing their temporal activity profiles during early neuronal development and offering valuable insights into their likely biological functions. This research advances our understanding of enhancer dynamics in brain development and offers new avenues for further investigations in this field.
Introduction
It has been over 40 years since the discovery of the first DNA sequence capable of enhancing the transcription of a reporter gene [1]. Since then, many cis-regulatory DNA elements known as enhancers have been identified, and their biochemical and functional properties have been extensively investigated. A central role of enhancers is to regulate gene expression by binding transcription factors (TFs) and other regulators able to modulate the transcription of target genes [2]. Enhancers can act independently of the distance and orientation to their target genes [3] and they have specific chromatin features that aid in their genome-wide identification through high-throughput methods. Common techniques used for enhancer identification are DNase I hypersensitivity sequencing (DNase-seq) [4] and the Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq) [5] which exploit the fact that enhancers are free from nucleosomes and are more sensitive to enzymatic treatment. One powerful tool for identifying active enhancers is Chromatin immunoprecipitation-sequencing (ChIP-seq), which utilizes antibodies that recognize specific histone modifications, such as H3K27Ac and H3K4me1, in enhancers flanking nucleosomes [3, 6, 7]. Additionally, active enhancers were found to be actively transcribed into enhancer RNAs (eRNAs) [8, 9], and to form loops with target promoters to exert their regulatory effects on gene expression. In recent years, several high-throughput 3D chromatin conformation techniques, such as Hi-C [10, 11], ChiA-PET [12, 13], and micro-C [14], have been used to identify physical interactions between potential enhancers and promoters, elucidating the spatial organization of the genome.
While these approaches are highly informative and allow identifying the location of potential enhancers, none of these represent the definitive proof of their activity. To address this, reporter assays have been widely used to confirm the activity of predicted enhancers by testing their ability to drive expression of a heterologous reporter gene. Previously, this method was limited to testing one sequence at a time using transgenic technologies [15, 16], but the development of high-throughput massively parallel reporter assays such as STARR-seq [17] and MPRA [18–21] has revolutionized the field, enabling the simultaneous testing of thousands of DNA sequences for enhancer activity in a single experiment.
Here, we used a genomic-integrated MPRA, the Lentiviral-based Massively Parallel Reporter Assay – LentiMPRA; [22] – to evaluate the activity of approximately 7,000 putative gene-linked enhancers. These enhancers were previously identified in human induced pluripotent stem cells (iPSC)-derived forebrain organoids and human fetal cortex combining ChIP-seq and Hi-C data [23] and are potentially involved in early human neurodevelopment.
Results
Enhancer selection and lentiMPRA experimental design
To characterize the activity of a set of enhancers putatively involved in early human neurodevelopment, we used lentiMPRA in forebrain organoids to test the activity of a subset of the 96,375 gene-linked enhancers identified by ChIP-seq in our previous study conducted in human fetal cortex and human forebrain organoids [23]. In the original dataset, enhancers were discovered by ChIP-seq, annotated by ChromHMM and linked to putative target genes by proximity and/or external DNA conformation (Hi-C) dataset from stem cells and fetal brains [23]. We generated an MPRA library selecting the most active (e.g., with highest H3K27Ac peak signal) 6,989 enhancers in organoids as measured by ChIP-seq data (Fig 1A). Additionally, we included a total of 122 positive control sequences from three different datasets: i) 87 enhancers from human embryonic stem cells (hESCs) validated using the ChIP-STARR-seq approach [24]; ii) 21 MPRA-validated enhancers from hESC-derived neuronal progenitor cells (NPCs) [25]; iii) 35 human brain enhancers from the Vista Enhancer Browser validated by mouse transgenesis [26]. We also included 150 negative controls generated by shuffling the nucleotides of 150 randomly selected candidate enhancer regions. To address the length limitation of oligonucleotide synthesis, we selected a minimal enhancer region of 270 bp by intersecting our enhancers with synonymous information from other datasets, such as p300 ChIP-seq peaks from neuronal cell lines and fetal brain [27, 28], DNA hypersensitivity peaks [29] and CAGE analysis from fetal brain and neuronal cell types [28, 30–32]. The goal was to identify regions with the highest overlap across these datasets which likely corresponds to the core active enhancer region. If the resulting core regions still exceeded the length of 270 bp, we used FIMO [33] to refine them by identifying a subregion with the highest number of known transcription factor binding sites (TFBSs) (see S1 Fig and Methods). In total, the library included 7,261 sequences synthetized along with 15 bp adapters on either side. The lentiMPRA library was amplified, and a minimal promoter and 15 bp random barcode were placed downstream of each synthesized sequence and cloned into a lentiMPRA vector upstream of the GFP coding sequence (Fig 1A).
To investigate whether our candidate enhancers can elicit a time-dependent transcriptional response during early neurodevelopment, we infected three induced pluripotent stem cell (iPSC) lines with our lentiMPRA library and measured enhancer activity at the iPSC stage and at two terminal differentiation (TD) points during forebrain organoid differentiation: an earlier stage (TD0) when organoids were almost exclusively composed of proliferating progenitors and a more mature stage (TD30) when organoids were still harboring progenitors and actively generating layer 5–6 cortical postmitotic neurons (S2 Fig; Methods).
LentiMPRA identifies active enhancers during forebrain organoids differentiation
Using DNA sequencing, we discovered that 95.1% (6,907/7,261) of the tested enhancer sequences in the original library were successfully recovered, with an average of 39.9 unique barcodes associated with each enhancer sequence (S3 Fig). Out of the total 7,261 sequences, 94.9% passed stringent quality control (Methods). In the MPRA experiment, enhancer activity is measured as the ratio of transcribed barcode reads (obtained by RNA-seq) to integrated genomic barcode reads (evaluated by DNA-seq). To identify active enhancers in the tested set, we first defined the background distribution for the ratio of RNA/DNA barcodes. We reasoned that the presence of a multitude of TFs in each cell, combined with the non-deterministic (i.e., probabilistic) nature of TF binding motifs and the largely unstudied effects of TF cooperativity, could result in enhancer activity even in randomly shuffled negative control sequences. Consequently, the distribution of activity for the negative control sequences was approximated as a sum of two gaussian distributions, one representing the true background signal and one representing an actual signal from potentially active sequences (Figs 1B and S4; Methods). Such a bimodal approximation precisely described the observed experimental distribution with an average signal from likely active sequences being roughly 55% stronger that the average background (Fig 1B). Such an approximation also described the signal distribution for positive controls, suggesting that some of the positive controls are not active, albeit enriched for active enhancers as compared to negative controls (S4 Fig).
MPRA-active enhancers were defined as those having a signal significantly above the background distribution (Fig 1B). Altogether, 34.8% enhancers were active in at least one time point (S1 Table). Of the core validated enhancers, 1,755 (74.6%) exhibited activity at a specific time point reflecting a stage-specific epigenomic regulation: 193 were active only in iPSCs, 543 at TD0, and 1,010 at TD30. More than 25% of active enhancers were shared between time points, with a large proportion being active at both TD0 and TD30 (449), indicating that some regulatory elements may play a role in both early and late stages of neurodevelopment (Fig 1C).
The background and the actual signal distribution were wide and overlapped significantly, limiting a clear discrimination between active and inactive enhancers, and reducing the power of the MPRA approach (Fig 1B). Based on this overlap and a selected p-value threshold, we estimated that we were only powered to validate about 35% of truly active enhancers. To elaborate on the missing fraction of true enhancers, we clustered all tested enhancers by their activity profile (RNA/DNA ratio) across samples in the MPRA experiment. This clustering approach revealed that almost all enhancers fell into two large clusters, cluster 1 and 2 (Fig 2A). Cluster 1 had just a few MPRA-active enhancers, while the cluster 2 encompassed almost all MPRA-active enhancers. We interpreted the data as suggesting that the inactive enhancers in cluster 2 were likely validated, but they did not formally pass the significance test due to the low sensitivity of the MPRA assay. This result would also explain the limited reproducibility of MPRA-active enhancers across samples at TD0 and TD30 (S5 Fig).
We next intersected MPRA-tested enhancers with external ATAC-seq-derived peaks obtained from bulk and single cell data from fetal brains and forebrain-directed organoids [34, 35] as well as from whole-brain organoids [36]. A large proportion (79.1%) of the tested enhancers were present in these external datasets. Consistent with our expectations, we observed that MPRA-active enhancers were enriched in fetal brains and forebrain-directed organoid datasets from Trevino et al. and Ziffra et al. (Figs 2B inset, S6 and S2 Table).
To further qualify the nature of MPRA-active enhancers, we compared the numbers of TFBSs within active and inactive enhancers. Our analysis revealed that while positive control enhancers had significantly more TFBSs compared to both scrambled negative controls and tested enhancers (Fig 2C), we did not observe any significant difference in the number of TFBSs or in the expression of cognate TFs between MPRA-active and -inactive tested enhancers.
We next took advantage of the fact that in the original dataset of Amiri et al. we identified putative target genes for the tested enhancers in their native DNA location, by either proximity in linear DNA and/or 3D DNA conformation datasets [23]. We then compared the MPRA-active and -inactive enhancers with regard to the expression of their linked genes. To increase the statistical power of our analysis, we incorporated bulk RNA-seq data from 88 additional samples from a parallel experiment using genetically distinct iPSC lines (collected at TD0 and TD30, using the same forebrain differentiation protocol) in addition to the samples used in the MPRA experiment. Our analysis revealed that the genes linked to MPRA-active enhancers exhibited significantly higher expression levels than those associated with inactive enhancers (Fig 2D) [23]. Genes linked to enhancers in both categories had higher expression than “background”, i.e., all gene-linked enhancers from Amiri et al. [23]. This likely reflects the selection of the most active enhancers for the MPRA experiment. The fact that MPRA-inactive enhancers had higher target gene expression than the “background” expression was consistent with the low sensitivity of MPRA experiment estimated above, implying that a significant fraction of MPRA-inactive enhancers could actually be active in a different experimental setting.
We then correlated the difference in enhancer activity (measured by lentiMPRA) across time points with the difference in expression of the gene(s) linked to the enhancer in the endogenous genomic context. Comparing enhancer activity and gene expression at TD0 versus the iPSC stage, we found that MPRA-active enhancers upregulated in iPSCs or at TD0 were typically correlated (positively or negatively) with the difference in expression of their endogenous linked genes (Figs 3A, 3B and S7). In contrast, MPRA-inactive enhancers rarely showed a correlation with the expression of their corresponding genes when comparing either TD0 (Figs 3A and 3B) or TD30 with iPSCs (S8 and S9 Figs). These observations demonstrate that the MPRA-active enhancers, compared to the inactive ones, were those that, in their appropriate genomic context, were linked to highly expressed genes and that differential MPRA activity predicted differential gene expression during differentiation. Given the nature of the MPRA assay, it can be inferred that the activity of these MPRA enhancers is less dependent on the genomic context. Among the enhancers exclusively active in iPSCs, we found that the most active one (Table S3) is located upstream of the GTPase-activating protein (SH3 domain)-binding protein 2 (G3BP2) gene, which encodes an RNA binding protein involved in maintaining pluripotency by regulating the transcription factors Oct-4 and Nanog [37] (Fig 3C). The G3BP2 gene was also upregulated in iPSCs compared to both TD0 and TD30 stages (Figs 3A and S8), consistent with the MPRA results and with its biological role in maintaining pluripotency. The 270 bp tested enhancer region is potentially bound by seven TFs (S3 Table), including ZNF263, a TF expressed in pluripotent stem cells [38] and computationally predicted to have the highest affinity for the binding motif (Fig 3C).
One of the most active enhancers, validated both at TD0 and TD30 (S3 Table), partially overlapped with an exon located ~400kb downstream of the transcription start Site (TSS) of the Cut Homeobox 1 (CUX1) gene, a TF playing a critical role in upper layer cortical neuron differentiation, dendrite branching, and synapse formation [39, 40]. Among the four TFs predicted to bind this active enhancer (S3 Table) there is ASCL1, which has been previously identified to promote CUX1 expression [41]. Additionally, a second intronic enhancer linked to CUX1, bound by ZNF436, showed mild MPRA activity only at TD0 (Fig 3D).
Discussion
Spatial and temporal regulation of gene expression through regulatory regions, specifically enhancers, is essential for shaping the structure and function of the human brain during development. A number of biochemical techniques, such as ChIP-seq, ATAC-seq, DNase-seq have been used for identifying enhancers and characterizing their activity in neurodevelopment. Among those, the MPRAs have been adopted to test an enhancer’s ability to activate a synthetic reporter in a high throughput and context-independent manner, providing an orthogonal readout of enhancer activity. In this study, we used lentiMPRA in forebrain organoids to evaluate, over the course of brain development, the activity of ~7,000 gene-linked enhancers previously identified in human fetal tissues and brain organoids from a combination of histone-mark ChIP-seq and DNA conformational studies. Besides validating 2,352 enhancers, our analysis of the MPRA-active enhancers in relation to upstream binding TFs and downstream targets suggested important implications to the development biology of the human brain.
By analyzing RNA-seq data from forebrain organoids collected at the same time points, we found that while there was no relation with number of potential TFBSs or TFs expression levels, MPRA-active enhancers are associated with highly expressed genes in the endogenous genomic context. Furthermore, differential enhancer activity, as determined by the lentiMPRA assay, tended to be correlated, in a positive or negative fashion, with differential gene expression across timepoints of neural differentiation. This provides an interesting insight into the capability of MPRA to detect enhancers associated with genes dynamically regulated during different stages of neurodevelopment.
On a technical side, our study highlights a few essential limitations of MPRA techniques. Perhaps the major limitation of MPRA is its low sensitivity (poor discrimination of signal from noise), estimated to be about 35% in our experiments, which likely explains limited reproducibility across replicates. For the later differentiation time points this could also be due to lentivirus regulatory element transgene silencing, which was previously reported to be an issue over differentiation of stem cells into other cell types [42]. More data are required to precisely assess the variability of this assay across technical replicates and genetically different iPSC lines. In addition, there may be some false negative results in the MPRA assay, since the assay tests a shorter version of the original enhancer, which may not include all the necessary elements required for transcription initiation. For example, MPRA-tested enhancers may lack crucial co-activators that are provided by DNA loop conformation, or present in the flanking sequences of the tested region. This may explain why, while detecting a significant enrichment of MPRA-active enhancers with external ATAC-seq datasets obtained in fetal brain or forebrain organoids, a considerable percentage of MPRA-inactive enhancers also overlapped with those external datasets.
Similarly to any bulk assay, using MPRA in a heterogenous system such as organoids, which are characterized by cellular diversity, reveals another caveat of the technique. Given the absence of cell-type information coupled to enhancer activity, it is likely that MPRA may be strongly biased towards revealing those enhancers present in the most abundant cell types. Indeed, when intersecting with scATAC-seq data obtained from the same organoids preparations at TD0 and TD30, there was a significant enrichment of MPRA-active enhancers only in the most abundant cell population, radial glia cells (Table S4) [43]. Future studies using single-cell MPRA-seq approach [44, 45] in combination with existing single-cell biochemical assays such as scATAC-seq, could open new avenues to gain a comprehensive understanding of the fine regulatory dynamics occurring within each cell type in complex neuronal differentiation systems.
Materials and Methods
lentiMPRA library design
Candidate enhancers were identified in Amiri et al 2018 [23] by chromatin segmentation analyses using H3K27Ac, H3K27me3 and H3K4me3 ChIP-seq peaks datasets obtained in cortical organoids and cerebral cortical tissue from postmortem fetal human brains. Enhancers were linked to putative target genes by proximity and/or by association with promoters using fetal brain 3D chromatin conformation data. From an initial dataset of >300,000 enhancers, 96,375 were found to be associated with genes and termed gene-linked enhancers (GLE). From the GLE dataset, approximately 7000 enhancers were selected for MPRA based on top activity, as determined by H3K27Ac peak signal. As ChIP-seq enhancers can encompass hundreds to thousands of bases, for the purpose of oligonucleotide synthesis for the MPRA assay, a core of 270 bp region was identified. To achieve this, our selected enhancers were intersected with enhancers from other datasets to select the region with the highest number of overlaps, and therefore potentially more active. In detail, we used: i) p300 ChIP-seq peaks from human neuronal cell lines [32] and human fetal cortex [46], ii) CAGE analysis from brain tissues [30], and iii) DNase hypersensitivity peaks from neuronal progenitor cells and brain tissue [29]. Finally, if there was no overlap or if the overlapping enhancer region was still too large, we used FIMO [33] to further reduce the size to 270 bp by selecting the region with the highest number of TFBSs, as TF binding is a reliable predictor of enhancer activity. List of tested enhancers is outlined in Table S1.
lentiMPRA library generation
The lentiMPRA plasmid library was constructed as previously described in Gordon et al., 2020 [22]. Briefly, the oligonucleotide pool of the 7,261 enhancers was synthetized by Twist Bioscience and amplified via two rounds of PCR, first to add the minimal promoter and then to add the barcode, using two sets of adaptors primers, 5BC-AG-f01/r01 and 5BC-AG-f02/r02 respectively (Table S5). The amplified fragments were cloned via Gibson assembly (using NEBuilder HiFi DNA Assembly Master Mix; New England BioLabs, cat. No. E2621L) into the SbfI/AgeI site of the pLS-SceI vector (Addgene, cat. No. 137725, a gift from Ahituv’s lab) to construct the library. The resulting library was digested with I-SceI (New England BioLabs, cat. No. R0694S) to remove any vector that did not receive an insert. The recombination products were then electroporated into electrocompetent cells (NEB 10-beta, New England BioLabs, cat. No. C3020K) and plated onto Carbenicillin plates. Sanger sequencing of 32 colonies, using n40.dn.F and EGFP.up.R primers (Table S5), was then used to confirm the proper assembly of the library. The library was purified using a number of colonies needed to achieve the desired number of Barcodes (n=50) to be associated at each sequence. Barcode-associated fragments were amplified using P7-pLSmp-ass-gfp (100 μM) and P5-pLSmP-ass-i741 primers (Table S5), purified using Plasmid plus midi kit (QIAGEN) and tested for its quality via sequencing on a MiSeq (see below section).
MiSeq
The association between enhancer sequences and the barcodes was ascertained using Illumina MiSeq v2.0 sequencing with pair-ended 150 bp read length. The reads overlap for 30 bp in the middle of the enhancer sequences. Three MiSeq libraries were sequenced to obtain enough number of barcodes covering each tested enhancer sequence with the total number of 168 million reads. The barcodes with 15 bp length were sequenced at the same time with the same read names as the enhancer sequences with the same batch of MiSeq.
Lentivirus packaging and MOI
The lentiMPRA library was packaged into lentivirus using the plasmid library, psPAX2 (RRID: Addgene_12260) and pMD2.G (RRID: Addgene_12259) and its titration was determined as previously described [22]. In brief, iPSCs were plated at 0.150 million cells/well in 24-well plates and incubated for 24 hours. Serial volume (0, 2, 4, 8, 16, 32 ul) of the lentivirus was added. The infected cells were cultured for 3 days and washed with PBS three times before genomic DNA extraction. Genomic DNA was extracted using the Wizard SV genomic DNA purification kit (Promega). Virus titer and copy number of viral particles per cell were measured by qPCR as previously described [22].
iPSCs reprogramming and maintenance
Two iPSC lines were used for the lentiMPRA experiment: ACE1815 (two technical replicates, named ACE_1 and ACE_2 in the figures) and 11251. Lines were generated from human skin fibroblasts obtained from a skin biopsy of normal individuals using a viral-free episomal reprogramming method [43, 47] at the Yale Stem Cell Reprogramming Core and passaged for 18 or 20 passages before differentiation. Informed consent was obtained from each donor according to the regulations of the Institutional Review Board and Yale Center for Clinical Investigation at Yale University. The participants agreed to data sharing of genomic unidentified data using controlled data access. All iPSC lines used in this study fulfilled standard reprogramming criteria, including (i) immunocytochemical expression of pluripotency markers (NANOG; SSEA4; TRA1-60); (ii) expression of known hESC/iPSC markers (SOX2, NANOG, LIN-28, GDF3, OCT4, DNMT3B) by semi-quantitative RT-PCR; (iii) downregulation of exogenous reprograming factors. The iPSC lines derived from fibroblasts were cultured on Matrigel (Corning Matrigel Matrix Basement Membrane Growth Factor Reduced)-coated dishes with mTESR1 media (StemCell Technologies) and propagated using Dispase (StemCell Technologies).
Lentiviral infection
The lentiMPRA library was transduced into iPSCs at the undifferentiated stage. To provide 75%−80% confluency the next day, iPSCs were seeded on Matrigel-coated 10 cm dishes in mTeSR1 media supplemented with 10 uM Y-27632. After 24 hours, the cells were infected with the lentivirus library at an average of Multiplicity Of Infection (MOI) of 4–5 and incubated for 3 days with daily media changes to remove non-integrated virus. The experiment involved three independent replicate cultures, including two lines from different individuals (ACE1815 and 11251), and one technical replicate (ACE1815-1 and ACE1815-2). All cultures were infected at the same time and using the same lentiviral library.
Forebrain Organoid differentiation
iPSC lines were differentiated into forebrain organoids as described in Jourdon et al 2023 [43]. Briefly, undifferentiated iPSC colonies were treated with 5 µM of the Y27632 compound and dissociated to single cells with Accutase (Millipore, 1:2 dilution in PBS 1X). Four million dissociated cells were seeded in each well of a 6-well plate and cultured in mTeRS1 with 10 µM Y27632 compound on an orbital shaker at a speed of 95 rpm. Forebrain neural induction was triggered by dual SMAD inhibition in mTeSR1 media supplemented with 10µM SB431542, 1µM LDN193189 and 5µM Y-27632 (day1). At day 2, embryoid bodies were cultured in KSR medium (DMEM supplemented with 15% Knockout Serum Replacement, 1% L-Glutamine, 1% NEAA, 1% Pen/Strep and 55 µM of 2β-Mercaptoethanol, 2-ME) with the addition of SB431542, LDN193189, XAV939 and Y-27632. The neural induction with dual SMAD inhibition was maintained until day 7 after which organoids were gradually adapted to NIM medium (DMEM/F12, 1% N2 supplement, 2% B27 without vitamin A, 1% NEAA, 1% Pen/Strep, 0.15% Glucose and 1% Glutamax) through a dilution series of KSR and NIM. Neural progenitors proliferation was induced at day 9 in NIM 75% and KSR 25% supplemented with FGF2 (10ng/ml) and EGF (10ng/ml) and organoids were mantained in proliferative medium until day 16 in 100% NIM. Terminal differentiation was initiated at day 17 (TD0) in terminal differentiation medium (Neurobasal medium supplemented with 1% N2, 2% B27 without vitamin A, 15 mM HEPES, 1% Glutamax, 1% NEAA and 55 µM 2-ME) with the addition of the neutrophic factors BDNF (10 ng/ml) and GDNF (10 ng/ml) until TD30. In the differentiation phase, half of the medium was changed twice a week. Organoids were transferred from wells of a 6-well plate to a 10 cm dish between TD5 and TD10 and the speed of the orbital shaker was decreased to 80 rpm.
Cell harvesting, library preparation and DNA/RNA extraction and sequencing for barcodes count
The infected cells were harvested at three different time points: in iPSCs after 3 days from the infection before starting organoid differentiation (iPSC stage), and in iPSC-derived forebrain organoids at earlier (TD0) and more mature (TD30) terminal differentiation points. Genomic DNA and total RNA were simultaneously extracted using the AllPrep DNA/RNA Mini Kit (Qiagen, cat. No. 80204) following the manufacturer’s protocol. RNA samples were treated with Turbo DNase (Life Technologies, cat. No. AM1907) to remove any residual DNA contamination. Sequencing libraries were prepared as previously described [22]. Briefly, at least 60 μg total RNA per sample was used for reverse transcription with SuperScript II (Life Technologies, cat. No. 18064-071) using the primer P7-pLSmP-ass16UMI-gfp (Table S5) to add a 16-bp UMI and a P7 flowcell sequence downstream of the barcode. PCR steps were performed on the DNA and RNA samples in order to amplify barcodes, adding P5 flowcell sequence and sample index upstream, and P7 flowcell sequence and UMI downsteam to the barcode. Finally, the sequencing libraries were pooled and subjected to paired-end sequencing with UMI, and sample index read.
Immunostaining
Organoids were randomly selected and fixed in 4% PFA in PBS for 2–4 hours. The organoids were then cryopreserved in 25% sucrose overnight, embedded in O.C.T. (Sakura), and frozen on dry ice before being stored at −80°C. Serial cryosections were obtained with a thickness of 12–16 µm. Immunostaining was performed by incubating the sections in blocking solution (PBS, 10% Donkey Serum, 1% Triton-100) for 1 hour, followed by incubation with primary antibodies (overnight, 4°C) and secondary antibodies (1–2 hours, from Jackson ImmunoResearch or ThermoFisher Scientific). The slides were then mounted with coverslips using VECTASHIELD (Vector Labs) and imaged on a Zeiss microscope equipped with an apotome module and ZEN 3.3 (ZEN pro) software. Three cell lines were used for immunocytochemical analyses, and a minimum of four organoids per line were analyzed. Images were acquired randomly to cover the entire extent of the organoid. Antibody list: FOXG1 (rabbit, 1:200, Takara) and PAX6 (mouse, 1:200, BD Bioscience), EOMES (rabbit, 1:1000, Abcam), FOXP2 (goat, 1:200, Santa Cruz), GAD1 (mouse, 1:200, Chemicon), HuC/D (mouse, 1:200, Invitrogen), SOX1 (goat, 1:100, R&D Systems), CTIP2 (rat, 1:500, Abcam).
LentiMPRA computational pipeline
Pre-processing using MPRAflow and MPRAnalyze
The association between enhancer sequences and barcodes was identified using MPRAflow association package [22] with the following command:
nextflow run association.nf --fastq-insert "R1.fastq.gz" --fastq-bc "barcode.fastq.gz" --design "design.fa" --name "MPRAflow" --fastq-insertPE "R2.fastq.gz" -w {workdir} --labels "labels.txt" --cigar 270M.
The count for number of reads for each RNA and DNA barcodes in all 9 samples for all enhancers were calculated using MPRAflow count package [22] with the following command:
nextflow run count.nf --dir "DNA_RNA" --e "experiments.csv" --design "design.fa" --association "MPRAflow_filtered_coords_to_barcodes.pickle" --labels "labels.txt" --umi-length 15 --name "countUMI" --outdir {workdir} –mpranalyze
MPRAnalyze [48] was then performed to normalize the count for RNA barcodes and DNA barcodes in all 9 samples to generate the enhancer activity (RNA/DNA ratio) using the standard pipeline.
Identification of negative control distribution using negative references
A mixed Gaussian distribution was applied to calculate the distribution of negative controls using the 135 negative control activity in 9 samples. The curve fit function in Scipy package [49] was applied to estimate the mean and standard deviation of the negative control distribution. The distribution accurately described the negative control data with Kolmogorov–Smirnov test p-value = 0.33 and Anderson-Darling test p-value = 0.12.
Candidate regulatory sequences activity quantification and validation
The p-value for the activity of an enhancer to be higher than the negative control distribution was calculated using the enhancer activity (RNA/DNA) and the negative control distribution. The significantly active enhancers were then identified using the p-value of 0.05 after Bonferroni correction using the total number of enhancers tested.
Estimating sensitivity of MPRA experiment to define an enhancer as an active one
Using the negative control Gaussian distribution, we defined a cutoff for the RNA/DNA ratio that corresponds to a significant p-value. We then calculated the area under the positive control Gaussian and above the cutoff. The sensitivity was then estimated as a ratio of the area and the total area under the positive control Gaussian distribution.
Clustering of enhancer activity for candidate regulatory sequences
Enhancer activity (RNA/DNA) of candidate regulatory sequences and positive reference sequences was calculated by MPRAnalyze. Subsequently, Seaborn 0.11.0 [50] clustermap was employed for clustering the enhancer activity values. Ward’s minimum variance method was utilized for clustering. The resulting heatmap displayed the clustered enhancer activity values. The color used for enhancer activity larger than 4 was the same as that used for the value of 4.
Transcription factor binding site identification and expression of the transcription factors
FIMO [33] was employed to identify TFBSs from all 7,289 sequences, including candidate regulatory sequences, positive reference sequences, and negative reference sequences. The transcriptional factor binding site annotation was downloaded from JASPAR2022 [51] with 1,252 Homo Sapiens annotations. The command for running FIMO [33] is as follows:
fimo --o {out_dir} JASPAR2022_tfbs input.fa.
TFBSs identified by FIMO with FDR <= 0.05 were taken for further analyses. We quantified the number of TFBSs in each sequence and compared the distribution between validated and non-validated candidate regulatory sequences using Kolmogorov–Smirnov test. We further calculated the expression (RPKM) of the TFs which bind to these TFBSs in 97 TD0/TD30 organoid samples. The expression of TFs in validated and non-validated candidate regulatory sequences was compared using Kolmogorov–Smirnov test.
Genes regulated by candidate regulatory sequences and expression
Genes regulated by candidate enhancers were identified by Amiri et al. [23] taking confident_set1, confident_set2, and proximity genes. The expression (RPKM) of the genes was calculated from 97 TD0/TD30 organoid samples.
Intersection with external datasets
Intersection with external datasets were performed using BedTools [52] and the resulting data was plotted using the ‘UpSetR’ package in R [53].
Single cell ATAC-seq analysis
For each 10X scATACseq sample, fastq files were first processed by cellranger-atac v2.0.0 with default parameters and 10X prebuilt reference arc-GRCh38-2020-A-2.0.0. The resulted cell-by-peak count matrix was first processed and filtered by Signac following online vignette (https://stuartlab.org/signac/articles/pbmc_vignette.html). Cell types were annotated using our annotated scRNAseq dataset and the label transfer method implemented in Seurat following online vignette (https://satijalab.org/seurat/articles/atacseq_integration_vignette.html). scATACseq peaks were then called using all reads in a sample or subsets of reads from each annotated cell type by running Signac function CallPeaks with default parameters.
Acknowledgements
We thank the members of the Vaccarino lab for extensive discussions, technical help and contributions to methods. We thank Guilin Wang and Christopher Castaldi, and the Yale Center for Genome Analysis for library preparation, deep sequencing, and Cell Ranger analysis. We thank Caihong Qiu and Jason Thomson at the Yale Stem Cell Center for the generation of the iPSC lines. Part of the illustrations were created using BioRender.Com.
Funding Statement
We acknowledge the following grant support: National Institute of Mental Health R01 MH109648 (FMV), R56 MH114911 (FMV), R56 MH114899 (AA), R56 MH114901 (GC), U01 MH116438 (NA), the Simons Foundation awards No. 632742 (FMV, AA).
Footnotes
Declaration of Interest
NA is the cofounder and on the scientific advisory board of Regel Therapeutics and receives funding from BioMarin Pharmaceutical Incorporated.
Data sharing
The source MPRA data described in this manuscript are available via the PsychENCODE Knowledge Portal (https://psychencode.synapse.org/). The PsychENCODE Knowledge Portal is a platform for accessing data, analyses, and tools generated through grants funded by the National Institute of Mental Health (NIMH) PsychENCODE Consortium. Data is available for general research use according to the following requirements for data access and data attribution: (https://psychencode.synapse.org/DataAccess). For access to content described in this manuscript see: https://doi.org/10.7303/syn51072171.1. The source bulk RNA-seq data are available at the NIMH Data Archive (NDA) under collection #C2424, at url: https://nda.nih.gov/edit_collection.html?id=2424.
Supporting information
S1 Fig. Minimal enhancer region selection.
S2 Fig. Immunocytochemical characterization of forebrain organoids.
S3 Fig. Statistics of MiSeq barcodes for tested enhancers.
S4 Fig. Gaussian mixture model with positive and negative controls.
S5 Fig. RNA/DNA ratios in all tested enhancers at iPSC, TD0 and TD30 comparing between different samples.
S6 Fig. Intersection with external datasets
S7 Fig. MPRA-active enhancers positively or negatively correlate with expression of the corresponding linked genes.
S8 Fig. MPRA-active enhancers correlate with differences in expression of linked genes across timepoints.
S9 Fig. MPRA-active enhancers positively or negatively correlate with expression of the corresponding linked genes.
S1 Table. MPRA-tested enhancers activity across samples.
S2 Table. Intersection of MPRA-tested enhancers with external datasets.
S3 Table. Enhancers activity, TFs and linked-genes expression.
S4 Table. Intersection of MPRA-tested enhancers with scATAC-seq
S5 Table. Primers sequences for library amplification, virus titration and sequencing
References
- 1.Banerji J, Rusconi S, Schaffner W. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell. 1981;27(2 Pt 1):299–308. doi: 10.1016/0092-8674(81)90413-x. [DOI] [PubMed] [Google Scholar]
- 2.Dogan N, Wu W, Morrissey CS, Chen KB, Stonestrom A, Long M, et al. Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility. Epigenetics Chromatin. 2015;8:16. Epub 20150423. doi: 10.1186/s13072-015-0009-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shlyueva D, Stampfel G, Stark A. Transcriptional enhancers: from properties to genome-wide predictions. Nat Rev Genet. 2014;15(4):272–86. Epub 20140311. doi: 10.1038/nrg3682. [DOI] [PubMed] [Google Scholar]
- 4.Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, et al. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods. 2006;3(7):503–9. doi: 10.1038/nmeth888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10(12):1213–8. Epub 2013/10/08. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473(7345):43–9. Epub 20110323. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012;9(3):215–6. Epub 20120228. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim TK, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J, et al. Widespread transcription at neuronal activity-regulated enhancers. Nature. 2010;465(7295):182–7. Epub 20100414. doi: 10.1038/nature09033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.De Santa F, Barozzi I, Mietton F, Ghisletti S, Polletti S, Tusi BK, et al. A large fraction of extragenic RNA pol II transcription sites overlap enhancers. PLoS Biol. 2010;8(5):e1000384. Epub 20100511. doi: 10.1371/journal.pbio.1000384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lieberman-Aiden E, van Berkum NL, Williams L, Imakaev M, Ragoczy T, Telling A, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009;326(5950):289–93. doi: 10.1126/science.1181369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014;159(7):1665–80. Epub 20141211. doi: 10.1016/j.cell.2014.11.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012;148(1–2):84–98. doi: 10.1016/j.cell.2011.12.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Fullwood MJ, Liu MH, Pan YF, Liu J, Xu H, Mohamed YB, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462(7269):58–64. doi: 10.1038/nature08497. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, et al. Ultrastructural Details of Mammalian Chromosome Architecture. Mol Cell. 2020;78(3):554–65 e7. Epub 20200325. doi: 10.1016/j.molcel.2020.03.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature. 2006;444(7118):499–502. Epub 20061105. doi: 10.1038/nature05295. [DOI] [PubMed] [Google Scholar]
- 16.Kvon EZ. Using transgenic reporter assays to functionally characterize enhancers in animals. Genomics. 2015;106(3):185–92. Epub 20150611. doi: 10.1016/j.ygeno.2015.06.007. [DOI] [PubMed] [Google Scholar]
- 17.Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science. 2013;339(6123):1074–7. doi: 10.1126/science.1232542. [DOI] [PubMed] [Google Scholar]
- 18.Inoue F, Ahituv N. Decoding enhancers using massively parallel reporter assays. Genomics. 2015;106(3):159–64. Epub 20150610. doi: 10.1016/j.ygeno.2015.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Melnikov A, Murugan A, Zhang X, Tesileanu T, Wang L, Rogov P, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat Biotechnol. 2012;30(3):271–7. Epub 2012/03/01. doi: 10.1038/nbt.2137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Mulvey B, Lagunas T Jr., Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry. 2021;89(1):76–89. Epub 20200618. doi: 10.1016/j.biopsych.2020.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tewhey R, Kotliar D, Park DS, Liu B, Winnicki S, Reilly SK, et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell. 2018;172(5):1132–4. doi: 10.1016/j.cell.2018.02.021. [DOI] [PubMed] [Google Scholar]
- 22.Gordon MG, Inoue F, Martin B, Schubach M, Agarwal V, Whalen S, et al. lentiMPRA and MPRAflow for high-throughput functional characterization of gene regulatory elements. Nat Protoc. 2020;15(8):2387–412. Epub 20200708. doi: 10.1038/s41596-020-0333-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Amiri A, Coppola G, Scuderi S, Wu F, Roychowdhury T, Liu F, et al. Transcriptome and epigenome landscape of human cortical development modeled in organoids. Science. 2018;362(6420). doi: 10.1126/science.aat6720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Barakat TS, Halbritter F, Zhang M, Rendeiro AF, Perenthaler E, Bock C, et al. Functional Dissection of the Enhancer Repertoire in Human Embryonic Stem Cells. Cell stem cell. 2018;23(2):276–88 e8. Epub 2018/07/24. doi: 10.1016/j.stem.2018.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Inoue F, Kreimer A, Ashuach T, Ahituv N, Yosef N. Identification and Massively Parallel Characterization of Regulatory Elements Driving Neural Induction. Cell stem cell. 2019;25(5):713–27 e10. Epub 2019/10/22. doi: 10.1016/j.stem.2019.09.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA Enhancer Browser--a database of tissue-specific human enhancers. Nucleic Acids Res. 2007;35(Database issue):D88–92. Epub 20061127. doi: 10.1093/nar/gkl822. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Visel A, Blow MJ, Li Z, Zhang T, Akiyama JA, Holt A, et al. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature. 2009;457(7231):854–8. doi: 10.1038/nature07730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Roadmap Epigenomics C, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–30. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Meuleman W, Muratov A, Rynes E, Halow J, Lee K, Bates D, et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature. 2020;584(7820):244–51. Epub 20200729. doi: 10.1038/s41586-020-2559-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, et al. An atlas of active enhancers across human cell types and tissues. Nature. 2014;507(7493):455–61. doi: 10.1038/nature12787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82. Epub 2012/09/08. doi: 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Consortium EP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57–74. Epub 2012/09/08. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8. Epub 20110216. doi: 10.1093/bioinformatics/btr064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Trevino AE, Sinnott-Armstrong N, Andersen J, Yoon SJ, Huber N, Pritchard JK, et al. Chromatin accessibility dynamics in a model of human forebrain development. Science. 2020;367(6476). doi: 10.1126/science.aay1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ziffra RS, Kim CN, Ross JM, Wilfert A, Turner TN, Haeussler M, et al. Single-cell epigenomics reveals mechanisms of human cortical development. Nature. 2021;598(7879):205–13. Epub 2021/10/08. doi: 10.1038/s41586-021-03209-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Fleck JS, Jansen SMJ, Wollny D, Zenk F, Seimiya M, Jain A, et al. Inferring and perturbing cell fate regulomes in human brain organoids. Nature. 2022. Epub 20221005. doi: 10.1038/s41586-022-05279-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gupta N, Badeaux M, Liu Y, Naxerova K, Sgroi D, Munn LL, et al. Stress granule-associated protein G3BP2 regulates breast tumor initiation. Proc Natl Acad Sci U S A. 2017;114(5):1033–8. Epub 20170117. doi: 10.1073/pnas.1525387114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Singh G, Mullany S, Moorthy SD, Zhang R, Mehdi T, Tian R, et al. A flexible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells. Genome Res. 2021;31(4):564–75. Epub 20210312. doi: 10.1101/gr.272468.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Platzer K, Cogne B, Hague J, Marcelis CL, Mitter D, Oberndorff K, et al. Haploinsufficiency of CUX1 Causes Nonsyndromic Global Developmental Delay With Possible Catch-up Development. Ann Neurol. 2018;84(2):200–7. Epub 20180831. doi: 10.1002/ana.25278. [DOI] [PubMed] [Google Scholar]
- 40.Cubelos B, Sebastian-Serrano A, Beccari L, Calcagnotto ME, Cisneros E, Kim S, et al. Cux1 and Cux2 regulate dendritic branching, spine morphology, and synapses of the upper layer neurons of the cortex. Neuron. 2010;66(4):523–35. doi: 10.1016/j.neuron.2010.04.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dennis DJ, Wilkinson G, Li S, Dixit R, Adnani L, Balakrishnan A, et al. Neurog2 and Ascl1 together regulate a postmitotic derepression circuit to govern laminar fate specification in the murine neocortex. Proc Natl Acad Sci U S A. 2017;114(25):E4934–E43. Epub 20170605. doi: 10.1073/pnas.1701495114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Herbst F, Ball CR, Tuorto F, Nowrouzi A, Wang W, Zavidij O, et al. Extensive methylation of promoter sequences silences lentiviral transgene expression during stem cell differentiation in vivo. Mol Ther. 2012;20(5):1014–21. Epub 20120320. doi: 10.1038/mt.2012.46. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Jourdon A, Wu F, Mariani J, Capauto D, Norton S, Tomasini L, et al. Modelling idiopathic autism in forebrain organoids reveals an imbalance of excitatory cortical neuron subtypes during early neurogenesis. Nature Neuroscience, in press; bioRxiv. 2023. Epub 2023.01.12. doi: 10.1101/2022.03.19.484988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lalanne J-B, Regalado SG, Domcke S, Calderon D, Martin B, Li T, et al. Multiplex profiling of developmental enhancers with quantitative, single-cell expression reporters. [Pre-print]. In press 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Zhao S, Hong CKY, Myers CA, Granas DM, White MA, Corbo JC, et al. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat Genet. 2023;55(2):346–54. Epub 20230112. doi: 10.1038/s41588-022-01278-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Visel A, Taher L, Girgis H, May D, Golonzhka O, Hoch RV, et al. A high-resolution enhancer atlas of the developing telencephalon. Cell. 2013;152(4):895–908. doi: 10.1016/j.cell.2012.12.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Okita K, Matsumura Y, Sato Y, Okada A, Morizane A, Okamoto S, et al. A more efficient method to generate integration-free human iPS cells. Nat Methods. 2011;8(5):409–12. Epub 20110403. doi: 10.1038/nmeth.1591. [DOI] [PubMed] [Google Scholar]
- 48.Ashuach T, Fischer DS, Kreimer A, Ahituv N, Theis FJ, Yosef N. MPRAnalyze: statistical framework for massively parallel reporter assays. Genome Biol. 2019;20(1):183. Epub 20190902. doi: 10.1186/s13059-019-1787-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. Epub 20200203. doi: 10.1038/s41592-019-0686-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Waskom ML. seaborn: statistical data visualization. Journal of Open Source Software. 2021;6(60). doi: 10.21105/joss.03021. [DOI] [Google Scholar]
- 51.Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Lemma RB, Turchi L, Blanc-Mathieu R, et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2022;50(D1):D165–D73. doi: 10.1093/nar/gkab1113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. Epub 20100128. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Conway JR, Lex A, Gehlenborg N. UpSetR: an R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33(18):2938–40. doi: 10.1093/bioinformatics/btx364. [DOI] [PMC free article] [PubMed] [Google Scholar]