SUMMARY
Sox2 expression in mouse embryonic stem cells (mESCs) depends on a distal cluster of DNase I hypersensitive sites (DHSs), but their individual contributions and degree of interdependence remain a mystery. We analyzed the endogenous Sox2 locus using Big-IN to scarlessly integrate large DNA payloads incorporating deletions, rearrangements, and inversions affecting single or multiple DHSs, as well as surgical alterations to transcription factor (TF) recognition sequences. Multiple mESC clones were derived for each payload, sequence-verified, and analyzed for Sox2 expression. We found that two DHSs comprising a handful of key TF recognition sequences were each sufficient for long-range activation of Sox2 expression. In contrast, three nearby DHSs were entirely context-dependent, showing no activity alone but dramatically augmenting activity of the autonomous DHSs. Our results highlight the role of context in modulating genomic regulatory element function, and our synthetic regulatory genomics approach provides a roadmap for dissection of other genomic loci.
Keywords: genome writing, genetic engineering, gene regulation, synthetic regulatory genomics, enhancers, CTCF, stem cells
Graphical Abstract
eTOC blurb
Enhancer clusters are hallmarks of tissue-specific regulation, but the contributions of individual enhancers and their degree of interdependence remain unclear. Brosh et al. use synthetic regulatory genomics to repeatedly rewrite the Sox2 locus, dissecting its overall architecture and precisely delineating distinct roles for individual regulatory elements.
INTRODUCTION
Many developmentally important genes lie near clusters of highly cell-type specific DNase I hypersensitive sites (DHSs), typified by the β-globin locus control region (LCR)1-3. More recently, 'super-enhancers' have been proposed to exhibit exceptionally cooperative binding, acting as developmental switches that regulate key transcription factors (TFs)4-6. Exploration of the broad assumptions involved has been impeded by the challenges of large-scale genomic engineering, leaving fundamental questions unanswered. Individual constituent DHSs demonstrate enhancer activity in transient expression assays4,7, but it is less clear whether they act independently or synergistically in composite elements8-11. Furthermore, nearby enhancers may provide redundancy rather than augmenting gene expression per se12-15. Therefore, there is a need for new approaches to investigate the architecture of complex regulatory elements.
Sox2 encodes a key TF that regulates mouse embryonic stem cell (mESC) self-renewal and pluripotency16. The Sox2 gene is surrounded by a proximal enhancer cluster, but its expression in mESCs also relies on an LCR comprising multiple DHSs located 100 kb downstream17,18. The effect of the LCR on Sox2 expression is influenced by distance and is sensitive to insertion of intervening CTCF sites19-21. Surgical inversion of a single CTCF recognition sequence within the Sox2 LCR affects chromatin architecture but not Sox2 expression in mESCs22, suggesting that proximity and expression might be separate functions23,24. The Sox2 locus therefore provides a natural model for dissecting how the composition and configuration of complex regulatory elements determine transcriptional activity. However, previous studies have focused on transient reporter assays which may not recapitulate function at the endogenous locus, or simple deletion analyses which do not address sufficiency or interaction of individual elements.
We have recently developed the Big-IN platform to enable scarless genome rewriting with payloads exceeding 100 kb25. Big-IN includes two engineering steps: first, CRISPR/Cas9 is used to target and replace an allele of interest with a landing pad (Figure 1A). Second, the landing pad is replaced by a transfected DNA payload using recombinase-mediated cassette exchange (RMCE), and payload-harboring cells are isolated using a positive/negative selection strategy. A comprehensive sequencing verification pipeline accompanies each step to ensure on-target single copy integration and lack of unexpected changes (Figure 1B). Finally, repeated delivery to the same landing pad cell line enables direct comparison in isogenic mESC lines differing only by the payload sequence.
Here we use Big-IN to rewrite the mouse Sox2 locus with designer payloads incorporating deletions, inversions, translocations, and mutations of single and multiple DHSs. We show that the LCR is a complex element whose constituent DHSs are only partially interchangeable and exhibit unexpected dependencies not previously predicted by reporter assays or deletion analyses. We show that the contribution of individual TF activity depends critically on their local context. Future application of this synthetic regulatory genomics approach to assess regulatory architecture promises to cast new light on the function of key loci genome-wide.
RESULTS
Synthetic regulatory genomics of the Sox2 locus
To enable interrogation of regulatory element function at the murine Sox2 locus, we integrated Big-IN landing pads into the BL6 allele of C57BL6/6J x CAST/EiJ (BL6xCAST) F1 hybrid mESCs. We targeted independent landing pads to replace both the 143-kb Sox2 locus (LP-Sox2) and the 41-kb region surrounding the Sox2 LCR (LP-LCR) (Figure 1A). We performed comprehensive quality control (QC) using PCR genotyping and sequencing, which identified clones with single-copy, on-target landing pad integration (Figure S1, and Table 1). Using real-time quantitative reverse transcription PCR (qRT-PCR) with allele-specific primers, we confirmed biallelic Sox2 expression in parental cells, ablation of BL6 Sox2 expression in LP-Sox2 mESCs, and near total loss of BL6 Sox2 expression in LP-LCR mESCs, recapitulating previous reports (Figure 1C)17,18. Total loss of the BL6 Sox2 allele was well-tolerated, with no discernable impact on cell morphology or growth rate.
Table 1. Engineered mESC clones quality control summary.
Landing pad integration – all clones (n=308) |
Payload delivery – all clones (n=1565) |
||||
---|---|---|---|---|---|
Passed | Failed | Passed | Failed | No call | |
PCR genotyping | 6 (2%) | 302 (98%) | 791 (51%) | 774 (49%) | 0 |
Landing pad integration – sequenced clones (n=6) |
Payload delivery – sequenced clones (n=441) |
||||
Capture-seq | 5 (83%) | 1 (17%) | 385 (87%) | 40 (9%) | 16 (3%) |
Landing pad clones used for payload delivery (n=3) |
Payload clones selected for characterization (n=350) |
||||
Sequencing coverage QC | 3 (100%) | 0 | 248 (71%) | 1 (0.3%)(a) | 101 (28%)(c) |
Allelic ratio QC | 3 (100%) | 0 | 220 (63%) | 0 | 130 (37%)(c) |
bamintersect QC | 3 (100%) | 0 | 343 (98%) | 7 (2%)(b) | 0 |
gDNA sample with landing pad backbone coverage attributed to unrelated plasmid contamination.
Only one of two junctions was verified. No off-target junctions were reported.
Small regions, regions with small number of variants, and human orthologous sites were not analyzed (see Table S15 and Methods for details).
We assembled DNA payloads for Big-IN delivery (size = 177 to 142,667 bp) from a BL6 mouse BAC (bacterial artificial chromosome) encompassing the full Sox2 locus or from chemically synthesized DNA (Table S1). We verified the resulting payloads through sequencing (Table S2), delivered them to landing pad mESCs, isolated multiple mESC clones for each payload, and comprehensively verified single-copy on-target payload integration using targeted capture sequencing (Capture-seq) (Table 1, Table S3). Approximately 50% of delivery clones isolated passed PCR genotyping, and the majority of those (87%) passed subsequent Capture-seq screening. All 341 mESC clones characterized in this study were further confirmed by systematic analyses of sequence coverage depth, allelic ratios, and integration site (Table 1). The high proportion of on-target delivery attests to the fidelity of RMCE and the efficiency of the positive and negative selection for Big-IN delivery.
To dissect the architecture of the Sox2 locus, we delineated 28 distinct regulatory sites through inspection of genomic annotations including DNA accessibility and TF occupancy data (Figure 2A). We delivered payloads including single and multiple deletions of these regions to LP-Sox2 mESCs and measured Sox2 expression from the engineered allele relative to the unedited CAST allele as an internal control (Figure S2). Delivery of the wild-type (WT) locus restored Sox2 expression nearly to the WT level (93%), whereas payloads lacking the LCR (ΔLCR) showed minimal expression (Figure 2B, Figure S2, Table S4). Deletion of DHSs 1-8 in the proximal upstream region of Sox2 showed a 17% reduction in expression. Similarly, deletion of DHSs 10-16 downstream of Sox2 led to a 22% reduction in expression. A larger deletion of the entire 81-kb intervening region between Sox2 and its LCR (including DHSs 10-18) showed expression comparable to WT, suggesting that loss of the proximal regulatory elements might be compensated by reduced distance to the LCR20. Combined deletion of both the upstream and downstream proximal DHSs led to even lower Sox2 expression than either single deletion. A more restricted deletion of DHSs 1-8 with just DHSs 10 and 15 showed a similar reduction in activity, while a combined deletion of DHSs 1-8 with CTCF17 alone or with CTCF sites 13-14 showed little effect. We note that both DHSs 10 and 15 were found active in STARR-seq and luciferase reporter assays17,26 (Figure 2A). We conclude that both upstream and downstream proximal regions contribute to Sox2 expression, and that the effect of the downstream deletion is largely attributable to DHSs 10 and 15.
Dissection of the Sox2 LCR
We next focused on a 41-kb region surrounding the Sox2 LCR containing a total of 10 DHSs and CTCF sites (Figure 3A). To quantify the necessity of each DHS for Sox2 expression, we delivered a series of LCR payloads to LP-LCR mESCs and analyzed expression of the BL6 Sox2 allele (Figure 3B). Delivery of the WT LCR completely rescued Sox2 expression. Analysis of single and multiple DHS deletions delineated a core LCR region comprising DHSs 23-26. Within this region, deletion of DHS24 alone critically ablated Sox2 expression to just 42% of WT, while deletion of DHS23 or DHS26 showed a lesser but significant impact. Within this core LCR, only deletion of CTCF25 was tolerated with little effect. DHS deletions outside of the core LCR showed little or no effect.
To investigate whether individual DHSs within the LCR function in isolation or rely on their surrounding DHSs, we delivered minimal payloads containing subsets of LCR DHSs to LP-LCR mESCs and measured their activities (Figure 4A and Figure S3). We observed that a minimal payload containing the four core LCR DHSs (23-26) recapitulated 88% of the activity of the full WT LCR. Activities of single DHSs varied substantially: DHS24 and DHS26 were able to restore 30% and 14% of Sox2 expression, respectively. Linking 2 copies of DHS24 (2x24) nearly doubled expression relative to a single copy, demonstrating that increased dosage of DHS24 without addition of any new TF activity can increase expression.
However, not all core LCR DHSs were sufficient to activate Sox2 expression. DHS23, despite being required for full Sox2 expression in the context of the entire LCR, was completely inactive on its own. But when DHS23 was linked to either DHS24 or DHS26, it augmented their overall activity nearly twofold. Deletion of DHS19 and DHS20, which were previously reported as inactive in a reporter assay17,26, showed no effect on expression (Figure 3). Delivery of DHS19 or DHS20 alone or together failed to drive Sox2 expression (Figure 4B). But in a similar fashion to DHS23, linking DHS20 with CTCF25 and DHS26 (20 & 25-26) doubled expression relative to CTCF25 and DHS26 alone (25-26). While DHS19 alone added little activity to CTCF25 and DHS26, it further doubled activity when combined with DHS20, CTCF25, and DHS26 (19 & 20 & 25-26). Thus while DHS19, DHS20, and DHS23 can have a potent effect when linked to other enhancers, their activity is entirely context-dependent.
Given the high degree of context dependence we observed, we developed an approach to visualize the extent to which DHSs depend on their context. We identified pairs of payloads differing solely by the presence of key LCR DHSs. We calculated the contextual contribution of each of these focus DHSs as the difference in Sox2 expression between each payload pair averaged across mESC clones (Figure 4C). To illustrate, the contribution of DHS23 was almost fully context-dependent: WT LCR and Δ23 payloads differed in activity by 0.33 (bottom row in the DHS23 group), while it contributed only 0.018 activity when DHS23 alone is compared to ΔLCR (top row in the DHS23 group). While each DHS exhibited a range of contributions across different contexts, only DHS24 and DHS26 demonstrated significant contribution when present alone. We thus partitioned the contribution of each DHS into context-dependent and autonomous components (Figure 4D). Based on Figure 4C, we defined the context-dependent contribution for each DHS as the range of ΔExpression that varies across contexts, and the autonomous contribution as the minimum ΔExpression across all contexts. This clearly highlighted the distinction between the entirely context-dependent DHS19, DHS20, and DHS23 and the more autonomous DHS24 and DHS26. Thus we conclude that the DHSs comprising the LCR are not interchangeable and can exhibit unexpected function when placed in novel contexts.
Context dependence might further manifest as a difference in expression for different configurations of the same set of DHSs. Since certain genomic features have an intrinsic polarity, such as transcription or the interaction between cohesin and CTCF, we investigated the effect of DHS orientation on LCR activity by comparing pairs of payloads that differ in the orientation of single or multiple DHSs (Figure 4E). While inversion of the full LCR resulted in a 24% decrease in its activity, inversions of smaller portions of the LCR had a lesser effect. We thus assessed a payload in which DHSs 23 and 24 were individually inverted while leaving their relative position unchanged. In contrast to the larger inversions, inversion of these DHSs showed no effect. Inversion of CTCF25 also showed no effect, consistent with a prior report showing that surgical inversion of a CTCF recognition sequence within CTCF25 had no effect on Sox2 expression22. Unexpectedly, inversion of DHS26 did show a slight increase in Sox2 expression. To explore whether the effect of inversions spanning multiple DHSs might be mediated by positional differences rather than the orientation of specific TF or CTCF recognition sites, we profiled multiple permutations of the core DHS order (Figure 4F). Each permutation of DHSs 23, 24, and 25 order reduced overall activity, whereas relocating CTCF25 downstream of DHS26 resulted in increased activity. These results suggest that the relative position rather than the orientation of DHSs within the LCR plays a role in their function.
Finally, to investigate whether individual DHSs may be functionally specialized, we examined published ChIP-seq and ChIP-nexus data4,27-30 at the LCR (Figure S3). We noted highly similar occupancy patterns across all DHSs: most DHSs were occupied by Nanog, Oct4, Klf4, Esrrb, and Zic3, with Sox2 and Pbx binding to a more restricted set of sites. Only DHS24 and DHS26 exhibited robust activity in reporter assays. DHS24 uniquely showed a high level of transcription initiation using PRO-seq31, oriented towards the Sox2 gene. These data show that individual DHSs each manifest characteristic regulatory signatures, but no single feature explains the distinction between the context-dependent and autonomous DHSs.
Modeling the regulatory architecture of the Sox2 locus
To coherently summarize the architecture of the Sox2 locus, we investigated linear regression models that predict Sox2 expression from surrounding DHS composition and configuration. We established a baseline model considering only key proximal regions along with the four DHSs in the core LCR. Performance improved substantially upon incorporation of interactions among core LCR DHSs, in particular the contribution of DHS23 in conjunction with DHS24 or DHS26, and CTCF25 in the presence of DHS26 (Figure 5A). Performance was higher when restricting to these three top interaction indicators than when considering all pairwise interactions. Given the potential influence of DHS configuration on function, we extended our top interaction regression model to include the orientation of each core LCR DHS and relative order encoded by indicators for DHS24 preceding DHS23 or CTCF25 preceding DHS24. Inclusion of either orientation or order separately improved model performance, but a model including order alone provided a better fit than orientation alone or a combination of both (Figure 5A). The coefficients of the best fit model showed strong weights for the presence of proximal regulatory regions and core LCR DHSs, as well as key interactions and DHS order (Figure 5B). While most payloads were predicted with little error, we noted that the contribution of DHS26 as well as inversions and permuted constructs demonstrated consistently higher error than other constructs, suggesting additional context effects not fully captured by our simple model (Figure S4). This model demonstrates the power of a synthetic regulatory genomics approach to dissect locus architecture in the presence of substantial context dependence.
TF-scale dissection of the core Sox2 LCR
Given the complex interdependencies among core LCR DHSs when considered as units, we used synthetic DNA to investigate function at the level of individual TF recognition sequences. We designed base payloads covering the core LCR DHSs that avoided or shortened a handful of repetitive regions to facilitate chemical synthesis (Figure 6A). We then designed a series of derivative payloads including surgical deletions, mutations or inversions to assess the individual and collective functions of putative TF or CTCF binding sites identified through analysis of TF motif matches and in vivo DNA accessibility and TF occupancy data (Figure 6B).
We started with DHS24, which is the strongest single DHS in the LCR and is itself capable of reproducing 30% of WT LCR activity (Figure 4A). We identified four distinct TF recognition sequence sites, numbered 24.1-24.4 (Figure 6B and Figure S5). These sites were predicted to be occupied by key mESC regulators, including the ESRR, nuclear receptor, SOX, and forkhead TF families. We analyzed the activity of synthetic payloads incorporating combinatorial deletions of these sites. Deletion or mutation of all four sites was sufficient to fully ablate activation of Sox2 by syn24 (Figure 6C). All single deletions showed at least some effect, with a reduction of base activity ranging from 83% (24.2) to 28% (24.4) (Figure 6C). Double or triple deletions showed a further negative effect, essentially reaching null when 24.2 and 24.3 were both deleted. A minimal payload containing the most essential site based on the deletion analysis (24.2) with its flanking regions was unable to confer any Sox2 expression. To summarize the contribution of each TF site relative to its context, we compared pairs of payloads differing by the presence of a focus TF site (Figure 6D). This analysis identified a clear context dependence for all four TF sites, and the contribution of each TF site was highest in the fullest context.
As DHS23 alone is incapable of activating Sox2 expression (Figure 4A), it was analyzed linked to DHS24 (syn23-24, Figure 6B). The activity of this payload was slightly higher than the comparable non-synthetic payload including DHSs 23-24 (Figure 6E), possibly due to the decreased distance between the core DHS peaks of DHS23 and DHS24 in the synthetic configuration. Deletion of 8 recognition sequence sites within DHS23 reduced activity by 47%, approaching that of syn24 alone and suggesting that they are the key sequences responsible for DHS23 function.
This synthetic approach further permitted assessment of two conserved DHSs from the human SOX2 locus, orthologous to mouse DHS23 and DHS24 (Figure S6A). While mouse DHS23 and DHS24 combined are sufficient to confer over half of WT Sox2 expression, the two orthologous human DHSs surprisingly showed no activity whatsoever (Figure S6B). While the core of the human DHSs showed sequence conservation with mouse sequence, analysis of TF recognition sequences showed a marked divergence relative to the orthologous mouse sites. This suggests that, despite the sequence orthology, the human LCR has diverged mechanistically.
Finally, we investigated CTCF25 and DHS26, which delineate a region covered by continuous DNA accessibility that comprises two broad DHS peaks positioned closely together (Figure 6A-B). As redundant CTCF binding is a feature of other insulator elements19,32-34, we performed a search for motif matches in addition to the previously reported CTCF site 25.222. This identified 9 CTCF recognition sequences in CTCF25 and DHS26, 3 of which are in convergent orientation relative to Sox2 (Figure 6B and Figure S5). While many of these sites lie outside the primary ChIP-seq peak, they reside within a DHS and demonstrate a trace level of CTCF occupancy in WT cells (Figure 6B), so we reasoned that they might be redundant with CTCF site 25.2. To test this hypothesis, we investigated whether inversion or deletion of multiple sites might result in a stronger phenotype than the reported inversion of 25.2 alone. Inversion of the 41-bp CTCF footprint at the 3 sites in convergent orientation to Sox2 (23-27 (Divergent CTCF), Figure 6B) led to a 20% reduction in expression (Figure 6F). Deletion of 8 CTCF recognition sequences (23-27 (ΔCTCF), Figure 6B) led to an even greater reduction of 36% (Figure 6F), bringing activity down nearly to the baseline of DHS23 and DHS24 alone, and showing that these sites are collectively required for full function of the CTCF25-DHS26 segment.
As DHS26 itself can function as an enhancer (Figure 4A), we next investigated the function of TF sites not overlapping with CTCF recognition sequences. A synthetic DHS26 payload (syn26) demonstrated slightly higher activity than the non-synthetic DHS26 (Figure 6F), likely because it extended into CTCF25 to include a predicted HNF4A/RARB site (25.4) and a CTCF site (25.3) (Figure 6B and Figure S5). Deletion of 5 TF recognition sequence sites in this context (syn26 (Δ5TFs)) completely abrogated activity (Figure 6F). Finally, Δ26.2-27 & 28, while lacking all 5 of these TF sites and CTCF sites 26.2-26.6, but containing 26.1 (a CTCF site also occupied by Pbx, and Zic3), showed nearly full expression (Figure 6B and Figure S6C). Our results are consistent with recent analyses of the Hoxd35 and Pax334 loci, and suggest that, like DHS24, CTCF25-DHS26 is a complex regulatory element harboring multiple essential TF sites whose binding is context dependent and synergistic.
DISCUSSION
Dissection of the Sox2 LCR shows it to be a highly complex and species-specific element whose function depends on the specific conformation of its constituent DHSs. Sufficiency analysis shows that the contribution of an individual DHS depends on its surrounding context, and DHS function ranges from substantial autonomy to complete context dependence. Our identification of three fully context-dependent enhancers at the same locus, DHS19, DHS20, and DHS23, enables comparison among them: Unlike DHS23 which was essential for full activity of the LCR, DHS19 and DHS20 were dispensable at their native locations, where it is possible that distance moderates their influence on DHSs in the core LCR. DHS19 differed from DHS20 and DHS23 in that it required DHS20 to robustly augment DHS26, suggesting an additional level of context dependence. The combination of the two context-dependent enhancers DHS19 and DHS20 produced activity only just above zero, suggesting that robust function requires the presence at least one autonomous enhancer. DHS23 augmented the activity of both DHS24 and DHS26, suggesting its function is not strictly tied to a single partner and at least somewhat flexible. The behavior of these DHSs resembles those recently identified at the Fgf536 and Hba37 loci, suggesting that context-dependent DHSs are widespread throughout the genome. Further investigation will be needed to establish whether context-dependent DHSs are distinguished globally by certain sequence or chromatin features and/or absence of activity in reporter assays38. The challenge to recognizing these context dependencies in reporter assays or more limited engineering of the endogenous locus underscores the importance of a synthetic regulatory genomics approach for comprehensive analysis of regulatory architecture in context.
The Sox2 LCR demonstrates a high degree of cooperativity which might arise through multiple mechanisms, including tightly coupled TF binding sites39, indirect interaction mediated through the chromatin template40, or local concentration of TFs. Our results suggest that core LCR function relies on relative DHS position but not their orientation. It is unclear whether this position effect derives from local interaction among the DHSs themselves, or from their location relative to the target TSS or other genomic location. We note that DHSs 23-26 overlap with directional transcription initiation measured by PRO-seq (Figure 3A). But inversions of DHSs 23 and 24 did not reduce Sox2 expression, arguing against an intrinsic directionality attributable to individual DHSs or altered TF binding across DHS junctions. We also note that deletions of TF recognition sites generally showed larger effects in more complete contexts, which contrasts with prior results showing that the effect of point variants is buffered by stronger DHSs41. We speculate that this discrepancy is related to the size of the perturbation, as TF binding can more readily tolerate point changes without falling below the threshold needed for activity42. Finally, as SOX2 binds indirectly at many of the LCR DHSs (Figure S3), it is also possible that this context effect is partially mediated through altered SOX2 levels in trans. We expect that our mapping of functional sites within core LCR DHSs (Figure 6) will provide a roadmap for future investigation of the sequence determinants of these context effects.
Our analysis uncovered unexpected effects when deleting multiple DHSs proximal to Sox2. In contrast to our results showing that deletion of DHSs 10 and 15 reduces Sox2 expression, single deletions of DHS15 and the combined deletion of DHSs 10-11 all have been reported to show no effect on expression17. We conclude that some of the Sox2-proximal DHSs are redundant and show no effect unless deleted together. Although these proximal regions contribute to expression at the endogenous locus, the LCR alone is capable of activating the Sox2 promoter at an ectopic site20. We observed that deletion of the 8 CTCF recognition sequences in CTCF25 and DHS26 did not further reduce activity when done in conjunction with a deletion of DHSs 1-8, suggesting that their function might be somehow connected (Figure 2B, Figure S6D). We also note a recent report that a similar deletion of DHSs 1-8 shows no effect on expression in mESC or in blastocysts21. Thus more work is needed to further dissect the role of proximal regulatory regions, their interplay with the LCR, and their function in different cell states.
We have described a general approach to dissect locus architecture in-place from 100-kb scale down to individual bp-level TF binding sites, and have shown that analyses of sufficiency and combinatorial rearrangements at the endogenous locus reveal unexpected behavior relative to reporter assays or single deletion analyses. We have already employed Big-IN in multiple cell types and loci25,43, suggesting that our approach is readily generalizable. While large-scale multiplexed CRISPR screens typically do not verify the effect of each edit, our genotyping and genomics validation has shown that a recombinase-based approach avoiding double stranded breaks yields high efficiency delivery with little or no off-target integration. We expect that engineering of large or complex payloads in yeast will become more widely available. In parallel, Golden Gate-based cloning of shorter payloads from commercial synthetic DNA provides a rapid, scalable, and accessible strategy for high-throughput analyses. These developments suggest that synthetic regulatory genomics approaches will rapidly be deployed to investigate function across additional contexts and loci.
Limitations of the study
Our approach investigates function at the endogenous locus, preserving its long-distance architecture and surrounding genomic context. Although WT constructs replacing the full locus or just the LCR were sufficient to rescue nearly full expression in mESC, it is possible their chromatin state might still differ from the endogenous locus. The payloads themselves are epigenetically naïve in that they do not pass through development and their chromatin state is established after delivery based on their primary sequence or the chromosome surrounding their delivery site. Characteristic chromatin and expression patterns are readily reestablished on DNA that passes through the germline44,45, but it is possible that DNA transfected to cycling cells exhibits only partial functionality46,47. For example, the failure of DHSs 19, 20, and 23 to activate Sox2 expression alone might arise from an inability to independently establish a functional accessible state. We expect future studies might study the establishment of chromatin state on rewritten genomic loci in more detail.
We have dissected the architecture of the Sox2 locus in undifferentiated mESCs, but its regulation might differ in other contexts. For example, while DHS19 and DHS20 lie outside the core LCR and are dispensable in mESCs, their potent effect when linked more closely to the core LCR suggests a possible latent function relevant to other cell states. Similarly, while Sox2 expression for a given payload was highly consistent across replicate clones, a single clone lacking DHSs 21-22 showed a nearly complete absence of activity in contrast to the other 8 clones which showed near-WT activity (Figure 3B). This outlier clone passed extensive genotyping and Capture-seq verification to demonstrate that the locus is intact, suggesting that seemingly redundant sites may influence robustness in certain circumstances.
Finally, while we performed deletion analysis of TF recognition sites, we have not identified specific TFs binding to these elements given the inherent ambiguity of sequences recognized by multiple transcription factors.
STAR METHODS
RESOURCE AVAILABILITY
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Matthew T. Maurano (maurano@nyu.edu).
Materials availability
All unique/stable reagents generated in this study are available upon request from the Lead Contact with a completed Materials Transfer Agreement.
Data and code availability
Sequencing data have been deposited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo (accession no. GSE206863). DNase-seq data were obtained from https://www.encodeproject.org for ES_CJ7 (ENCLB163SYJ, DS13320) and H7 human ESC (ENCLB449ZZZ, DS11909)7. CTCF ChIP-seq29 (GSM2259905), ChIP-seq data4,27,28 (GSM560343, GSM560345, GSM560350, GSM687280, GSM687282, GSM687285, GSM845236, GSM845238, GSM1082340, GSM1082341, GSM1082342), and ChIP-nexus30 for Zic3 (GSM4087824), Pbx (GSM4087823), Esrrb (GSM4087822), Sox2 (GSM4072777), Oct4 (GSM4072776), Nanog (GSM4072778), and Klf4 (GSM4072779), PRO-seq31 (GSE130691), and STARR-seq26 (GSM4261634) were obtained from the GEO repository. Gel electrophoresis images are available from Mendeley (doi: 10.17632/z7x3k943vz.1).
The processing pipelines for Capture-seq, ChIP-seq, and DNase-seq data are available on Github at https://github.com/mauranolab/dnase and has been deposited at Zenodo. DOIs are listed in the key resources table. All code for analyses herein is available upon request.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Bacterial and virus strains | ||
Parental S. cerevisiae strain | Brachmann et al., 199848 | BY4741 |
Payload S. cerevisiae strain | This paper | Table S1 |
E. Coli TransforMax EPI300 | Lucigen | EC300150 |
Chemicals, peptides, and recombinant proteins | ||
0.1% gelatin | EMD Millipore | ES006-B |
2-mercaptoethanol | Sigma | M3148 |
BsaI-HF V2 | NEB | R3733S |
Buffer P1 | QIAGEN | 19051 |
Buffer P2 | QIAGEN | 19052 |
Buffer P3 | QIAGEN | 19053 |
CHIR99021 | R&D Systems | 4423 |
CopyControl Induction Solution | Lucigen | CCIS125 |
dATP | NEB | N0440S |
dNTPs mix | NEB | N0447L |
Esp3I | NEB | R0734S |
FBS | BenchMark | 100-106 |
ganciclovir | Sigma | PHR1593 |
GlutaMAX | ThermoFisher | 35050061 |
GoTaq Green | Promega | M7123 |
KAPA 2× Hi-Fi Hotstart Readymix | Roche | 07958935001 |
Klenow DNA polymerase | NEB | M0210L |
Klenow Fragment (3'-5' exo-) | NEB | M0212L |
KnockOut DMEM | ThermoFisher | 10829018 |
LIF | Sigma | ESG1107 |
MEM nonessential amino acids | ThermoFisher | 11140050 |
N2 Supplement | ThermoFisher | 17502048 |
nucleosides | EMD Millipore | ES008-D |
PD0325901 | Sigma | PZ0162 |
Pen-Strep | ThermoFisher | 15140122 |
proaerolysin | Aerohead Scientific | n/a |
puromycin | ThermoFisher | A1113803 |
Sera-Mag Magnetic Beads | Cytiva | 65152105050250 |
T4 DNA polymerase | NEB | M0203L |
T4 polynucleotide kinase | NEB | M0201L |
T4 Quick Ligase | NEB | M2200S |
Taq Polymerase | NEB | M0273L |
Critical commercial assays | ||
DNeasy Blood & Tissue kit | QIAGEN | 69506 |
dsDNA High Sensitivity Assay Kit | Invitrogen | Q32851 |
KAPA SYBR FAST | Kapa Biosystems | KK4610 |
Multiscribe High-Capacity cDNA Reverse Transcription Kit | ThermoFisher | 4368814 |
Nucleobond XtraBAC kit | Takara | 740436 |
Turbo DNA-free kit | ThermoFisher | AM1907 |
Zymo yeast miniprep I protocol | Zymo Research | D2001 |
ZymoPURE maxiprep kit | Zymo Research | D4203 |
Zyppy Plasmid Miniprep Kit | Zymo Research | D4020 |
Deposited data | ||
Sequencing data | This paper | GSE206863 |
Figure S1 gel images | This paper | doi:10.17632/z7x3k943vz.1 |
Experimental models: Cell lines | ||
Parental C57BL6/6J × CAST/EiJ (BL6xCAST) mESCs | Eckersley-Maslin et al., 201449 | Clone 4 |
C57BL6/6J × CAST/EiJ (BL6xCAST) Landing Pad and Payload mESC clones | This paper | Table S3 |
C57BL/6J (MK6) | NYU Langone Health Rodent Genetic Engineering Core | n/a |
Oligonucleotides | ||
Cloning primers | This paper | Tables S5, S7 |
Genotyping primers | This paper | Tables S10, S11 |
qRT-PCR primers | This paper | Table S18 |
Synthetic fragments | This paper | Table S9 |
Recombinant DNA | ||
Payloads | This paper | Table S1, Data S1 |
pCAG-iCre | Addgene | 89573 |
pCTC019 | This paper | n/a |
pLM1110 | Addgene | 168460 |
pLP-PIGA (pLP140) | Addgene | 168461 |
pLP-PIGA2 (pLP300) | Addgene | 168462 |
pLP-PIGA3 (pLP305) | Addgene | 196992 |
pMH005 | Ribeiro-Dos-Santos et al. 202250 | n/a |
pNA0304 | Addgene | 165612 |
pNA0308 | This paper | n/a |
pNA0519 | Zhao et al., 202253 | n/a |
pRB051 | Addgene | 196993 |
pSpCas9(BB)-2A-GFP | Addgene | 48138 |
pSpCas9(BB)-2A-Puro | Addgene | 62988 |
Software and algorithms | ||
Sequencing processing pipeline | This paper | doi: 10.5281/zenodo.7662273 |
BWA | Li and Durbin 200958 | https://github.com/lh3/bwa |
bcftools | Bonfield et al., 202161 | https://github.com/samtools/bcftools |
BEDOPS | Neph et al., 201260 | https://bedops.readthedocs.io/en/latest/ |
R | R Core Team 201864 | https://www.r-project.org |
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Yeast
S. cerevisiae strain BY474148 was grown using Yeast Extract–Peptone–Dextrose (YPD) as rich medium or defined Synthetic Complete (SC) medium with appropriate amino acids dropped out as selective media.
Mouse cells
C57BL6/6J × CAST/EiJ (BL6xCAST) clone 4 male mESCs49 were kindly provided by David Spector, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. Cells were cultured as described25. Specifically, mESCs were cultured on plates coated with 0.1% gelatin (EMD Millipore ES006-B) in 80/20 medium comprising 80% 2i medium and 20% mESC medium. 2i medium contained a 1:1 mixture of Advanced DMEM/F12 (ThermoFisher 12634010) and Neurobasal-A (ThermoFisher 10888022) supplemented with 1% N2 Supplement (ThermoFisher 17502048), 2% B27 Supplement (ThermoFisher 17504044), 1% GlutaMAX (ThermoFisher 35050061), 1% Pen-Strep (ThermoFisher 15140122), 0.1 mM 2-mercaptoethanol (Sigma M3148), 1,250 U/mL LIF (Sigma ESG1107), 3 μM CHIR99021 (R&D Systems 4423), and 1 μM PD0325901 (Sigma PZ0162). mESC medium contained KnockOut DMEM (ThermoFisher 10829018) supplemented with 15% FBS (BenchMark 100-106), 0.1 mM 2-mercaptoethanol (Sigma M3148), 1% GlutaMAX (ThermoFisher 35050061), 1% MEM nonessential amino acids (ThermoFisher 11140050), 1% nucleosides (EMD Millipore ES008-D), 1% Pen-Strep (ThermoFisher 15140122), and 1,250 U/mL LIF (Sigma ESG1107). Cells were grown at 37 °C in a humidified atmosphere of 5% CO2 and passaged on average twice per week. Cells were routinely tested for mycoplasma contamination. MK6 C57BL/6J male mESCs were provided by the NYU Rodent Genetic Engineering Laboratory.
METHOD DETAILS
Cloning landing pad and CRISPR/Cas9 plasmids for genome integration
pLP-PIGA (pLP140) (Addgene #168461) was described previously25 and harbors a pEF1α-mScarlet-P2A-CreERT2-P2A-PuroR-P2A-hmPIGA-EIF1pA cassette flanked by loxM and loxP sites.
pLP-PIGA2 (pLP300) (Addgene #168462) (Figure 1A) was described previously25 and harbors a pEF1α-PuroR-P2A-hmPIGA-P2A-mScarlet-EIF1pA cassette flanked by loxM and loxP sites, as well as a pPGK1-ΔTK-SV40pA backbone counter-selectable marker cassette.
To derive LP-PIGA3 (pLP305) (Figure 1B), an intermediate pLP303a was cloned by removing mScarlet-P2A-CreERT2-P2A from pLP-PIGA using NcoI and SalI digest followed by a fill-in reaction with Klenow DNA Polymerase and self-ligation to yield a minimal LP plasmid consisting of pEF1α-PuroR-P2A-hmPIGA-EIF1pA cassette flanked by loxM and loxP sites (Table S5). Sleeping Beauty inverted terminal repeats (ITRs) were amplified from pMH00550 using primers oRB_277 + oRB_278 for ITR(L) and oRB_279 + oRB_287 for ITR(R). LP-PIGA3 was cloned using a BsaI Golden Gate reaction in which ITRs were cloned outside the landing pad region of pLP303a.
To target landing pads to specific genomic loci, guide RNAs (gRNAs) (Table S8) were cloned into pSpCas9(BB)-2A-Puro (pCas9-Puro, Addgene #62988) or pSpCas9(BB)-2A-GFP (pCas9-GFP, Addgene #48138) plasmids using BbsI Golden Gate reactions as described51. Landing pad homology arms (HAs) corresponding to the genomic sequence flanking the Cas9 cut sites were amplified from a BAC (Table S6). HAs were cloned flanking the LoxM and LoxP sites in pLP-PIGA2 using a BsaI Golden Gate reaction or flanking the ITRs in pLP-PIGA3 using a BsmBI Golden Gate reaction (Table S7). DNA suitable for transfection was prepped using the ZymoPURE maxiprep kit (Zymo Research D4203) according to the manufacturer’s protocol.
Landing pad sequences (excluding HAs and backbone) are provided in Data S1.
Payload assembly
All payloads were assembled into pLM111025 (Addgene #168460) or pRB051 (a derivative of pLM1110). Both are multifunctional vectors supporting low-copy DNA propagation, selection (LEU2), and efficient homology-dependent recombination in yeast; low-copy propagation, selection (KanR), and copy number induction in TransforMax EPI300 E. coli52, and transient visualization and selection (eGFP-BSD, enhanced green fluorescent protein - Blasticidin-S deaminase) in mammalian cells25.
Four different strategies were used to assemble payloads into these vectors (Table S1, Data S1), using different primers (Table S5), gRNAs (Table S8), and synthetic DNA fragments (Table S9):
1). BAC Fragment Assembly in Yeast
Desired segments of Sox2 BAC RP23-144O8 were released by an in vitro CRISPR/Cas9 digestion using a pair of synthetic gRNAs and recombinant Cas9 and assembled into BsaI-digested pLM1110 as previously described25.
2). Fragment Assembly in Yeast
PCR amplicons or synthetic DNA (IDT) tiling the desired payload with >75 bp overlap were assembled into pLM1110. Yeast cells were transformed with 20-50 ng I-SceI-digested pLM1110, 100 ng of each DNA fragment, and 50 ng terminal linker fragments (~400 bp gBlocks, IDT) to enable homologous recombination-dependent assembly followed by selection for Leu+ phenotype.
3). CRISPR Engineering of EPisomes in Yeast (CREEPY)
Existing payloads were modified in yeast, as in ref. 53, using CRISPR/Cas9 and synthetic linker fragment(s) (IDT) to mediate up to two deletions, insertions, or inversions simultaneously:
A yeast strain carrying a parental payload was pre-transformed with 100 ng yeast SpCas9 expression vector pNA051953 (carrying a ScHIS3 marker) or pCTC019 (carrying a SpHIS5 marker), and selected for a Leu+/His+ phenotype. Yeast were then transformed again with 100 ng single/dual gRNA expression plasmid pNA0304/pNA030854 carrying a ScURA3 marker and with 100 ng synthetic linker fragment(s), and selected for a Leu+/His+/Ura+ phenotype. gRNAs (Table S8) were cloned into the single/dual gRNA yeast expression plasmids pNA0304/pNA030855, respectively, using NotI and/or HindIII Gibson Assembly reactions as described54.
Alternatively, yeast cells were transformed with 100 ng pYTK-Cas9 plasmids (carrying ScHIS3 marker) that co-express SpCas9 and a gRNA and with 100 ng synthetic linker fragment(s), and selected for Leu+/His+ phenotype. gRNAs (Table S8) were cloned into pYTK-Cas953 using a BsmBI Golden Gate reaction.
4). Golden Gate Assembly (GGA)
DNA fragments were assembled into pLM1110 using BsaI or into pRB051 using Esp3I. pRB051 was cloned by PCR-amplifying the RFP transcriptional unit from pLM1110 using primers oRB_564 + oRB_565, digesting the product with Esp3I, and performing a BsaI Golden Gate Assembly into pLM1110. For payload assembly, 100 ng of pLM1110 or pRB051 was mixed with 20 ng of each DNA fragment containing terminal BsaI or Esp3I sites, respectively, designed to mediate assembly with neighboring fragments (DNA fragments were sourced either from synthetic fragments (IDT) or from existing payloads by PCR-amplification), and with 0.4 μL BsaI-HF V2 (NEB R3733S) or Esp3I (NEB R0734S), 1.5 μL 1 mg/mL BSA, 1 μL T4 Quick Ligase (NEB M2200S) and 1.5 μL T4 Ligase Buffer (NEB) in a total volume of 15 μL. Reactions were cycled 25 times between 37 °C (3 min) and 16 °C (4 min), heat inactivated at 50 °C (5 min) and 80 °C (5 min) before transforming into TransforMax EPI300 cells.
Yeast transformation, payload validation, and transfer to E. coli
Yeast transformations were performed using the Lithium acetate method56. For screening of correct clones, DNA was isolated from individual yeast colonies by resuspension in 10-40 μl of 20 mM NaOH and boiling for 3 cycles of 95 °C for 3 min and 4°C for 1 min. 2 μl of yeast lysate was used as a template in a 10 μl GoTaq Green reaction (Promega M7123) with 0.25 μM of primers. Clones were screened for the intended payload structure using primers that target newly formed junctions (positive screen), and in some cases using primers that target regions that are present in the parental plasmid but absent in the intended plasmid (negative screen). Payloads were isolated from candidate yeast clones using the Zymo yeast miniprep I protocol (Zymo Research D2001) and transformed into TransforMax EPI300 E. Coli cells by electroporation using the manufacturer’s protocol (Lucigen EC300150).
Payload DNA screening and preparation
For initial verification, TransforMax EPI300 E. coli colonies were picked into 3-5 mL LB-Kan supplemented with CopyControl Induction Solution (Lucigen CCIS125) and cultured overnight at 30 °C with shaking at 220 RPM. Payloads DNAs smaller than 20 kb were isolated using the Zyppy Plasmid Miniprep Kit (Zymo Research D4020).
For larger payload DNAs, crude DNA extraction was performed as follows: 2-3 mL of induced culture was spun down, resuspended with 300 μL RNase A-supplemented Buffer P1 (QIAGEN 19051), and topped with 300 μL of Buffer P2 (QIAGEN 19052). The suspensions were inverted 10 times, incubated at room temperature for 2 min, topped with 300 μL Buffer P3 (QIAGEN 19053), inverted 10 times and spun down at 12000 RPM for 5 min. Supernatants were transferred to new 2 mL tubes, topped with 900 μL isopropanol, mixed by inverting, and spun down at 12000 RPM for 5 min. Supernatants were discarded, pellets were washed with 500 μl of 70% EtOH, and spun down at 12,000 RPM for 1 min. Supernatants were discarded and the DNA pellets were airdried and resuspended in 30 μL TE buffer. Tubes were spun down at 12000 RPM for 1 min and clear supernatants transferred to new 1.5 mL tubes.
Payload assembly verification was performed by diagnostic digestion with restriction enzymes, PCR verification of new junctions, and Sanger sequencing, depending on the payload. All payloads were subsequently sequenced to high coverage depth.
Sequence-verified payload clones were grown overnight in 2.5 mL cultures of LB-Kan at 30 °C with shaking, diluted 1:100 in LB-Kan supplemented with CopyControl Induction Solution, and incubated for an additional 8-16 hours at 30 °C with shaking. For transfection, DNA was purified from induced E. Coli using the ZymoPURE maxiprep kit (Zymo Research D4203) for DNAs <20 kb or using the Nucleobond XtraBAC kit (Takara 740436) for DNAs >20 kb. DNA preps were stored at 4 °C.
mESC genome engineering
Landing pad integrations were performed (Table S3) using the Neon Transfection System as previously described into BL6xCAST ΔPiga mESCs, in which the endogenous Piga gene was deleted25. LP-PIGA A1 mESCs, in which LP-PIGA replaces the BL6 Sox2 allele were previously described25.
LP-PIGA2 integration at Sox2 (replacing a 143-kb genomic region) was performed using 5 μg of pLP-PIGA2 and 2.5 μg of each pCas9-GFP plasmid expressing BL6-specific gRNAs mSox2-5p-1 and mSox2-3p-5, followed by 1 μg/mL puromycin (ThermoFisher A1113803) selection for LP-harboring mESCs and 1 μM ganciclovir (GCV, Sigma PHR1593) selection against the HSV1-ΔTK gene in the LP-PIGA2 vector backbone.
LP-PIGA3 integration at the Sox2 LCR (replacing a 41-kb genomic region) was performed similarly with pCas9-Puro plasmids expressing the non-allele specific gRNA mSox2-DHS18-19 and the BL6-specific gRNA mSox2-3p-5, followed by 1 μg/mL puromycin selection. PCR genotyping and Capture-seq validation were used to identify clone LP-LCR H1 (Figure S1B and Figure S1D), to which a few LCR payloads were delivered (Table S3). DELLY analysis later identified a 7.7 kb deletion (Figure S1D) in the CAST allele surrounding the mSox2-DHS18-19 gRNA site in a small number of payload clones derived from LP-LCR H1 cells (not included in this study), indicating a mixed population of cells in the original LP-LCR culture. We therefore sub-cloned LP-LCR H1 to isolate clone LP-LCR C1 and confirmed it did not harbor the CAST allele deletion (Figure S1C-D). LP-LCR C1 was used for all subsequent LCR deliveries.
Payload deliveries were performed as previously described25 with 1-5 × 106 mESCs, 1-10 μg payload DNA and 2-5 μg pCAG-iCre plasmid (Addgene #89573) per transfection, depending on the payload size (larger payloads required more cells, payload DNA and pCAG-iCre plasmid to obtain sufficient correct mESC clones). Transfected mESCs were selected with 10 μg/mL blasticidin for 2 days starting day 1 post-transfection and with 2 nM proaerolysin for 2 days starting day 6 or 7 post-transfection.
For both landing pad integration and payload delivery, approximately 9 days post-transfection, individual mESC clones were manually picked into gelatinized 96-well plates prefilled with 100 μL 80/20 media. Two days post-picking, clones were replicated into two gelatinized 96-well plates at 90% and 10% relative densities. Three days later, crude gDNA was extracted from the 90% plate as described25 and used in PCR genotyping to identify candidate clones, which were then expanded from the 10% density plate for further verification and phenotypic characterization. Genomic DNA was extracted from expanded clones using the DNeasy Blood & Tissue kit (QIAGEN 69506).
mESC genotyping
Genotyping mESC clones was performed either using PCR followed by gel electrophoresis as described25 or using real-time quantitative PCR (qPCR), which was performed with the KAPA SYBR FAST (Kapa Biosystems KK4610) on a LightCycler 480 Real-Time PCR System (Roche) using either 96-well or 384-well qPCR plates. In most cases loading was performed using an Echo 550 liquid handler (Labcyte): A 384-well qPCR plate was prefilled with 5 μL KAPA SYBR FAST and 4 μL water per well. 100 nL of each 100 μM primer and 0.5 μL of each crude gDNA sample were transferred by the Echo. Genotyping payload clones typically included, in addition to assays designed to detect the newly-formed left and right junctions, assays to detect the loss of the landing pad, the absence of the payload vector backbone, and for large (>100 kb) payloads, allele-specific assays to detect delivered regions of the Sox2 locus. Genotyping primers are listed in Table S10 and Table S11.
High-throughput sequencing verification
Illumina sequencing libraries were prepared from purified DNA using three methods, as listed in Table S12:
The Illumina dsDNA protocol was previously described in ref.25. 1 μg of DNA was sheared in a 96-well microplate using the Covaris LE220 (450 W, 10% Duty Factor, 200 cycles per burst, and 90-s treatment time) or Covaris R230 (450 W, 10% Duty Factor, 600 cycles per burst, and 10-s treatment time for 4 repeats) to yield fragments between 500 to 900 bp. DNA fragments were end-repaired with T4 DNA polymerase (NEB M0203L), Klenow DNA polymerase (NEB M0210L), and T4 polynucleotide kinase (NEB M0201L), and A-tailed using Klenow (3′-5′ exo-; NEB M0212L). Illumina sequencing adapters were then ligated to DNA ends using Quick Ligase (NEB M2200L). The post-ligation product was purified using 18% Sera-Mag Magnetic Beads (Cytiva 65152105050250) in polyethylene glycol. DNA libraries were amplified with KAPA 2× Hi-Fi Hotstart Readymix (Roche 07958935001) and purified with 18% Sera-Mag Magnetic Beads in polyethylene glycol.
The One-Pot dsDNA protocol was performed as in the Illumina dsDNA protocol, except that 250 ng of sheared DNA was end-repaired and A-tailed in a single reaction using dATP (NEB N0440S), dNTPs mix (NEB N0447L), T4 DNA Polymerase, T4 Polynucleotide Kinase, and Taq Polymerase (NEB M0273L) and incubated in a thermocycler at 12 °C for 10 min, 37 °C for 10 min, and 72 °C for 20 min.
The NEBNext Ultra FS II kit (NEB E7805L) was used according to the provided protocol.
Final library concentrations were measured on a Qubit using the dsDNA High Sensitivity Assay Kit (Invitrogen Q32851).
Hybridization capture (Capture-seq) for targeted resequencing of engineered regions was performed as previously described25. Biotinylated bait was generated using nick translation from BACs covering the Sox2 locus (RP23-144O8 or RP23-274P9; see Table S6), landing pad plasmid (LP-PIGA2 or LP-PIGA3), payload vector backbone, pCAG-iCre and pSpCas9 plasmids. Bait sets and sequencing statistics are listed in Table S12.
Sequencing libraries were sequenced in paired-end mode on an Illumina NextSeq 500 operated at the Institute for Systems Genetics. Reads were demultiplexed with Illumina bcl2fastq v2.20 requiring perfect match to the indexing BC sequence. All whole-genome sequencing and Capture-seq data were processed using a uniform mapping pipeline. Illumina sequencing adapters were trimmed with Trimmomatic57 v0.39. Reads were aligned using BWA58 v0.7.17 to the appropriate reference genome (GRCm38/mm10 or GRCh38/hg38), including unscaffolded contigs and alternate references, as well as to independent custom references for relevant vectors. PCR duplicates were marked by samblaster59 v0.1.24. Per-base coverage depth tracks were generated using BEDOPS60 v2.4.4.
Variant calling was performed using a standard pipeline based on bcftools61 v1.14:
bcftools mpileup --excl-flags UNMAP,SECONDARY,DUP --redo-BAQ --adjust-MQ 50 -- gap-frac 0.05 --max-depth 10000 --max-idepth 200000 -a DP,AD --output-type u ∣
bcftools call --keep-alts --ploidy [1∣2] -- multiallelic-caller -f GQ --output-type u ∣
bcftools norm --check-ref w --output-type u ∣
bcftools filter -i “INFO/DP>=10 & QUAL>=10 & GQ>=99 & FORMAT/DP>=10” --set-GTs. --output-type u ∣
bcftools view -i ‘GT=“alt”’ --trim-alt-alleles --output-type z
Bcftools call --ploidy was set to 1 for custom references and 2 for autosomes in mm10.
Large deletions and structural variants were called using DELLY62 v0.8.7 excluding telomeric and centromeric regions. Variants were required to PASS filters, to have at least 10 paired-end reads and 20% of paired-end reads supporting the variant allele. DELLY results for payload and capture sequencing data were inspected manually and used to choose clones for successive analysis.
Data were visualized and explored using the University of California, Santa Cruz Genome Browser63. The full processing pipeline is available at https://github.com/mauranolab/mapping.
Payload sequence verification
To identify potential sample swaps or contamination, mean sequencing coverage was calculated for the genomic regions corresponding to the payload sequence and to the non-payload sequence, defined as the engineered Sox2 regions that do not overlap the payload. To minimize coverage abnormalities associated with sequence termini, 400 bp were clipped from each end of continuous genomic regions. Mean coverage was normalized by the payload vector backbone coverage. We expect a normalized coverage of 1±0.4 for the payload region and <0.1 for the non-payload region. Human orthologous sequences and regions smaller than 200 bp after clipping were ignored. Coverage analysis results and quality control (QC) calls for each sample are reported in Table S13 and summarized in Table S2.
To detect payload DNA variants that might have been introduced during assembly or propagation in yeast or bacteria, we analyzed high confidence variant calls relative to the payload custom reference having sequencing depth above 50 and quality score above 100 (DP ≥ 50 & QUAL ≥ 100). Payload variants are reported in Table S14 and QC calls are summarized in Table S2.
mESC clone sequence verification
To verify correct genomic engineering of mESC clones, mean sequencing coverage was calculated for the genomic regions corresponding to the payload and non-payload (as defined above); and for the payload vector backbone, landing pad (LP), landing pad backbone, and pCAG-iCre custom references. Terminal clipping of 400 bp was performed as described above for all genomic regions. Coverage was normalized by the mean genomic coverage of the engineered region flanks (regions captured by the Sox2 bait, but unmodified) such that a single copy corresponds to a value of ~0.5. For landing pad clones, we expect normalized coverage of 0.5±0.25 for the non-payload region and <0.1 for LP backbone. For payload clones, we expect normalized coverage of 1±0.25 for the payload region, 0.5±0.25 for the non-payload region and <0.1 for the landing pad, payload vector backbone and pCAG-iCre. Human orthologous sequences, and regions smaller than 200 bp after clipping were ignored. Samples where the payload region was ignored were classified as “No call”. Coverage analysis results and QC calls for each clone are reported in Table S15 and summarized in Table 1.
We further verified the presence of BL6 allele variants, which are lost in regions replaced by landing pads and restored by delivered payloads, by calculating allelic ratios as the mean proportion of reads supporting the reference (BL6) allele (propREF) for the genomic regions described above. Values of 0.5±0.2 and 0±0.2 were expected for payload and non-payload regions, respectively. Regions with 10 or fewer variants were ignored. Samples where the payload region was ignored were classified as “No call”. Allelic ratio results and QC calls for each clone are reported in Table S15 and summarized in Table 1.
To verify genomic integration sites, we detected sequencing read pairs mapping to two different reference genomes using bamintersect as previously described25 with slight modifications: Same-strand reads mapping within 500 bp were clustered, a minimum width threshold of 125 bp was required for reporting, and junctions with few reads (<1 and 5 reads/10M reads sequenced for landing pad and payload samples, respectively) were excluded. Bamintersect junctions were classified hierarchically based on position (Table S16). Results and QC calls for each clone are reported in Table S17 and summarized in Table 1.
RNA Isolation, cDNA synthesis and mRNA expression analysis by real-time qRT-PCR
RNA was isolated from fresh or frozen cells using the Qiagen RNeasy-mini protocol. Since Sox2 has no introns, additional steps were taken to ensure that RNA was not contaminated with trace residual genomic DNA. DNase treatment was performed on extracted RNA with the Turbo DNA-free kit (ThermoFisher AM1907) using the “rigorous DNase treatment” protocol prescribed by the manufacturer. Following DNase treatment, cDNA was synthesized from 1-2 μg total RNA with the Multiscribe High-Capacity cDNA Reverse Transcription Kit (ThermoFisher 4368814), including a “-RT” no-reverse-transcriptase control for a subset of samples.
Real-time quantitative reverse transcription PCR (qRT-PCR) was performed using KAPA SYBR FAST (Kapa Biosystems KK4610) on a 384-well LightCycler 480 Real-Time PCR System (Roche) and threshold cycle (Ct, also called Cp) values were calculated using Abs Quant/2nd Derivative Max analysis. Primers were designed to detect Sox2 in an allele-specific manner25 (Table S18). An Echo 550 liquid handler was used for loading as described above. Thermal cycling parameters were as follows: 3 min pre-incubation at 95 °C, followed by 40 amplification cycles of 3 sec at 95 °C, 20 sec at 57 °C and 20 sec at 72 °C. For a subset of samples, the -RT controls were verified for lack of amplification and all had Ct>31.
Assays were performed in duplicate and replicate wells on the same plate were averaged after masking wells with no or very low amplification. Raw Ct values are listed in Data S2. Ct values for the BL6 Sox2 allele ranged between 20 (WT) and 30 (ΔSox2), suggesting an accurate quantification range of 210. ΔCt values for the BL6 and CAST Sox2 alleles were computed relative to Gapdh. Replicate ΔCt measurements of the same clone across different plates were averaged. Sox2 fold change was defined as the difference between the BL6 and CAST Sox2 alleles, calculated as 2ΔCt[CAST-BL6]. Fold change was scaled to yield expression values ranging from 0 (ΔSox2) to 1 (WT) by subtracting the mean fold change calculated for the ΔSox2 samples from all data points, then dividing by the mean fold change of the appropriate WT payload samples (full locus or LCR) (Table S4).
QUANTIFICATION AND STATISTICAL ANALYSIS
Modeling
We fitted linear regression models to predict Sox2 expression based on DHS composition and configuration using R64 v3.5.2. Models were compared based on Bayesian Information Criterion (BIC) to evaluate the performance of predictor combinations. The composition of proximal (DHSs 1-8 and DHSs 10-16) and core LCR (DHS23, DHS24, CTCF25, and DHS26) regulatory elements was represented by their copy number in the payload. Regulatory element configuration was represented by indicator variables of their relative order (DHS24-DHS23 and CTCF25-DHS24; read as DHS24 before DHS23 and CTCF25 before DHS24) and sequence inversion (inv_DHS23, inv_DHS24, inv_CTCF25, and inv_DHS26). Payloads which did not map clearly onto these features (Δ26-27 & 28, Δ20-23.3, and the synthetic payloads in Figure 6 and Figure S6) were excluded from the analysis. Interaction between two elements was represented by indicator variables for the presence of both elements (DHS23and24, DHS23and26, and CTCF25andDHS26).
Transcription factor motif analysis
We used motif matches previously derived from scanning the reference genomes using FIMO v4.10.244 with TF motifs as previously described65.
Supplementary Material
Highlights.
Synthetic regulatory genomics enables dissection of enhancer necessity and sufficiency
Replacing the Sox2 locus control region with a single DHS retains 30% of its activity
Neighboring DHSs synergistically modulate each other’s activity
Context-dependent DHSs cannot work alone but double the activity of neighboring DHSs
ACKNOWLEDGEMENTS
We thank Brendan Camellato, Leslie Mitchell, Sudarshan Pinglay, Weimin Zhang, Yu Zhao, and Yinan Zhu for help with yeast assembly. This work was partially funded by National Institutes of Health (NIH) grants RM1HG009491 (to J.D.B.) and R35GM119703 (to M.T.M.).
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
DECLARATION OF INTERESTS
R.B., J.D.B., and M.T.M. are listed as inventors on a patent application describing Big-IN. J.D.B. is a founder and Director of CDI Labs, Inc., a founder of and consultant to Neochromosome, Inc, a founder of, SAB member of, and consultant to ReOpen Diagnostics, LLC, and serves or served on the Scientific Advisory Board of the following: Logomix, Inc., Sangamo, Inc., Modern Meadow, Inc., Rome Therapeutics, Inc., Sample6, Inc., Tessera Therapeutics, Inc. and the Wyss Institute.
REFERENCES
- 1.Grosveld F, van Assendelft GB, Greaves DR, and Kollias G (1987). Position-independent, high-level expression of the human beta-globin gene in transgenic mice. Cell 51, 975–985. 10.1016/0092-8674(87)90584-8. [DOI] [PubMed] [Google Scholar]
- 2.Evans T, Felsenfeld G, and Reitman M (1990). Control of globin gene transcription. Annual review of cell biology 6, 95–124. 10.1146/annurev.cb.06.110190.000523. [DOI] [PubMed] [Google Scholar]
- 3.Li Q, Peterson KR, Fang X, and Stamatoyannopoulos G (2002). Locus control regions. Blood 100, 3077–3086. 10.1182/blood-2002-04-1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, and Young RA (2013). Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319. 10.1016/j.cell.2013.03.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Pott S, and Lieb JD (2014). What are super-enhancers? Nature Genetics 47, 8–12. 10.1038/ng.3167. [DOI] [PubMed] [Google Scholar]
- 6.Blobel GA, Higgs DR, Mitchell JA, Notani D, and Young RA (2021). Testing the super-enhancer concept. Nat Rev Genet 22, 749–755. 10.1038/s41576-021-00398-w. [DOI] [PubMed] [Google Scholar]
- 7.Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, et al. (2012). The accessible chromatin landscape of the human genome. Nature 489, 75–82. 10.1038/nature11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shin HY, Willi M, HyunYoo K, Zeng X, Wang C, Metser G, and Hennighausen L (2016). Hierarchy within the mammary STAT5-driven Wap super-enhancer. Nat Genet 48, 904–911. 10.1038/ng.3606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hay D, Hughes JR, Babbs C, Davies JOJ, Graham BJ, Hanssen LLP, Kassouf MT, Oudelaar AM, Sharpe JA, Suciu MC, et al. (2016). Genetic dissection of the α-globin super-enhancer in vivo. Nature Genetics 48, 895–903. 10.1038/ng.3605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dukler N, Gulko B, Huang Y-F, and Siepel A (2016). Is a super-enhancer greater than the sum of its parts? Nat Genet 49, 2–3. 10.1038/ng.3759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Moorthy SD, Davidson S, Shchuka VM, Singh G, Malek-Gilani N, Langroudi L, Martchenko A, So V, Macpherson NN, and Mitchell JA (2017). Enhancers and super-enhancers have an equivalent regulatory role in embryonic stem cells through regulation of single or multiple genes. Genome Res 27, 246–258. 10.1101/gr.210930.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Hong J-W, Hendrix DA, and Levine MS (2008). Shadow enhancers as a source of evolutionary novelty. Science 321, 1314. 10.1126/science.1160631. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bothma JP, Garcia HG, Ng S, Perry MW, Gregor T, and Levine M (2015). Enhancer additivity and non-additivity are determined by enhancer strength in the Drosophila embryo. eLife 4. 10.7554/eLife.07956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Osterwalder M, Barozzi I, Tissières V, Fukuda-Yuzawa Y, Mannion BJ, Afzal SY, Lee EA, Zhu Y, Plajzer-Frick I, Pickle CS, et al. (2018). Enhancer redundancy provides phenotypic robustness in mammalian development. Nature 554, 239–243. 10.1038/nature25461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Scholes C, Biette KM, Harden TT, and DePace AH (2019). Signal Integration by Shadow Enhancers and Enhancer Duplications Varies across the Drosophila Embryo. Cell Rep 26, 2407–2418.e5. 10.1016/j.celrep.2019.01.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Avilion AA, Nicolis SK, Pevny LH, Perez L, Vivian N, and Lovell-Badge R (2003). Multipotent cell lineages in early mouse development depend on SOX2 function. Genes & Development 17, 126–140. 10.1101/gad.224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zhou HY, Katsman Y, Dhaliwal NK, Davidson S, Macpherson NN, Sakthidevi M, Collura F, and Mitchell JA (2014). A Sox2 distal enhancer cluster regulates embryonic stem cell differentiation potential. Genes & Development 28, 2699–2711. 10.1101/gad.248526.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li Y, Rivera CM, Ishii H, Jin F, Selvaraj S, Lee AY, Dixon JR, and Ren B (2014). CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS One 9, e114485. 10.1371/journal.pone.0114485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Huang H, Zhu Q, Jussila A, Han Y, Bintu B, Kern C, Conte M, Zhang Y, Bianco S, Chiariello AM, et al. (2021). CTCF mediates dosage- and sequence-context-dependent transcriptional insulation by forming local chromatin domains. Nat Genet 53, 1064–1074. 10.1038/S41588-021-00863-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zuin J, Roth G, Zhan Y, Cramard J, Redolfi J, Piskadlo E, Mach P, Kryzhanovska M, Tihanyi G, Kohler H, et al. (2022). Nonlinear control of transcription through enhancer-promoter interactions. Nature. 10.1038/s41586-022-04570-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Chakraborty S, Kopitchinski N, Zuo Z, Eraso A, Awasthi P, Chari R, Mitra A, Tobias IC, Moorthy SD, Dale RK, et al. (2023). Enhancer-promoter interactions can bypass CTCF-mediated boundaries and contribute to phenotypic robustness. Nat Genet. 10.1038/S41588-022-01295-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.de Wit E, Vos ESM, Holwerda SJB, Valdes-Quezada C, Verstegen MJAM, Teunissen H, Splinter E, Wijchers PJ, Krijger PHL, and de Laat W (2015). CTCF Binding Polarity Determines Chromatin Looping. Molecular Cell 60, 676–684. 10.1016/j.molcel.2015.09.023. [DOI] [PubMed] [Google Scholar]
- 23.Alexander JM, Guan J, Li B, Maliskova L, Song M, Shen Y, Huang B, Lomvardas S, and Weiner OD (2019). Live-cell imaging reveals enhancer-dependent Sox2 transcription in the absence of enhancer proximity. Elife 8. 10.7554/eLife.41769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Taylor T, Sikorska N, Shchuka VM, Chahar S, Ji C, Macpherson NN, Moorthy SD, de Kort MAC, Mullany S, Khader N, et al. (2022). Transcriptional regulation and chromatin architecture maintenance are decoupled functions at the Sox2 locus. Genes Dev 36, 699–717. 10.1101/gad.349489.122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Brosh R, Laurent JM, Ordoñez R, Huang E, Hogan MS, Hitchcock AM, Mitchell LA, Pinglay S, Cadley JA, Luther RD, et al. (2021). A versatile platform for locus-scale genome rewriting and verification. Proc Natl Acad Sci U S A 118, e2023952118. 10.1073/pnas.2023952118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Peng T, Zhai Y, Atlasi Y, Ter Huurne M, Marks H, Stunnenberg HG, and Megchelenbrink W (2020). STARR-seq identifies active, chromatin-masked, and dormant enhancers in pluripotent mouse embryonic stem cells. Genome Biol 21, 243. 10.1186/s13059-020-02156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS, et al. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430–435. 10.1038/nature09380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Whyte WA, Bilodeau S, Orlando DA, Hoke HA, Frampton GM, Foster CT, Cowley SM, and Young RA (2012). Enhancer decommissioning by LSD1 during embryonic stem cell differentiation. Nature 482, 221–225. 10.1038/nature10805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Beagan JA, Duong MT, Titus KR, Zhou L, Cao Z, Ma J, Lachanski CV, Gillis DR, and Phillips-Cremins JE (2017). YY1 and CTCF orchestrate a 3D chromatin looping switch during early neural lineage commitment. Genome Res 27, 1139–1152. 10.1101/gr.215160.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Avsec Ž, Weilert M, Shrikumar A, Krueger S, Alexandari A, Dalal K, Fropf R, McAnany C, Gagneur J, Kundaje A, et al. (2021). Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 53, 354–366. 10.1038/s41588-021-00782-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Etchegaray J-P, Zhong L, Li C, Henriques T, Ablondi E, Nakadai T, Van Rechem C, Ferrer C, Ross KN, Choi J-E, et al. (2019). The Histone Deacetylase SIRT6 Restrains Transcription Elongation via Promoter-Proximal Pausing. Mol Cell 75, 683–699.e7. 10.1016/j.molcel.2019.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Bell AC, and Felsenfeld G (2000). Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482–485. 10.1038/35013100. [DOI] [PubMed] [Google Scholar]
- 33.Dickson J, Gowher H, Strogantsev R, Gaszner M, Hair A, Felsenfeld G, and West AG (2010). VEZF1 elements mediate protection from DNA methylation. PLoS Genetics 6, e1000804. 10.1371/journal.pgen.1000804. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Anania C, Acemel RD, Jedamzick J, Bolondi A, Cova G, Brieske N, Kühn R, Wittler L, Real FM, and Lupiáñez DG (2022). In vivo dissection of a clustered-CTCF domain boundary reveals developmental principles of regulatory insulation. Nat Genet 54, 1026–1036. 10.1038/S41588-022-01117-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Amândio AR, Beccari L, Lopez-Delisle L, Mascrez B, Zakany J, Gitto S, and Duboule D (2021). Sequential in cis mutagenesis in vivo reveals various functions for CTCF sites at the mouse HoxD cluster. Genes Dev 35, 1490–1509. 10.1101/gad.348934.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Thomas HF, Kotova E, Jayaram S, Pilz A, Romeike M, Lackner A, Penz T, Bock C, Leeb M, Halbritter F, et al. (2021). Temporal dissection of an enhancer cluster reveals distinct temporal and functional contributions of individual elements. Mol Cell 81, 969–982.e13. 10.1016/j.molcel.2020.12.047. [DOI] [PubMed] [Google Scholar]
- 37.Blayney JW, Francis H, Camellato BR, Mitchell L, Stolper R, Boeke J, Higgs DR, and Kassouf M (2022). Super-enhancers require a combination of classical enhancers and novel facilitator elements to drive high levels of gene expression. bioRxiv, 2022.06.20.496856. 10.1101/2022.06.20.496856. [DOI] [Google Scholar]
- 38.Sahu B, Hartonen T, Pihlajamaa P, Wei B, Dave K, Zhu F, Kaasinen E, Lidschreiber K, Lidschreiber M, Daub CO, et al. (2022). Sequence determinants of human gene regulatory elements. Nat Genet 54, 283–294. 10.1038/s41588-021-01009-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Arnosti DN, and Kulkarni MM (2005). Transcriptional enhancers: Intelligent enhanceosomes or flexible billboards? Journal of cellular biochemistry 94, 890–898. 10.1002/jcb.20352. [DOI] [PubMed] [Google Scholar]
- 40.Mirny LA (2010). Nucleosome-mediated cooperativity between transcription factors. Proc Natl Acad Sci U S A 107, 22534–22539. 10.1073/pnas.0913805107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Maurano MT, Wang H, Kutyavin T, and Stamatoyannopoulos JA (2012). Widespread site-dependent buffering of human regulatory polymorphism. PLoS Genetics 8, e1002599. 10.1371/journal.pgen.1002599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Singh G, Mullany S, Moorthy SD, Zhang R, Mehdi T, Tian R, Duncan AG, Moses AM, and Mitchell JA (2021). A flexible repertoire of transcription factor binding sites and a diversity threshold determines enhancer activity in embryonic stem cells. Genome Res 31, 564–575. 10.1101/gr.272468.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Camellato B, Brosh R, Maurano MT, and Boeke JD (2022). Genomic analysis of a synthetic reversed sequence reveals default chromatin states in yeast and mammalian cells. bioRxiv, 2022.06.22.496726. 10.1101/2022.06.22.496726. [DOI] [Google Scholar]
- 44.Peterson KR, and Stamatoyannopoulos G (1993). Role of gene order in developmental control of human gamma- and beta-globin gene expression. Molecular and Cellular Biology 13, 4836–4843. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wilson MD, Barbosa-Morais NL, Schmidt D, Conboy CM, Vanes L, Tybulewicz VLJ, Fisher EMC, Tavaré S, and Odom DT (2008). Species-specific transcription in mice carrying human chromosome 21. Science 322, 434–438. 10.1126/science.1160930. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Li Q, Emery DW, Han H, Sun J, Yu M, and Stamatoyannopoulos G (2005). Differences of globin transgene expression in stably transfected cell lines and transgenic mice. Blood 105, 3346–3352. 10.1182/blood-2004-03-0987. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Lienert F, Wirbelauer C, Som I, Dean A, Mohn F, and Schübeler D (2011). Identification of genetic elements that autonomously determine DNA methylation states. Nature Genetics 43, 1091–1097. 10.1038/ng.946. [DOI] [PubMed] [Google Scholar]
- 48.Brachmann CB, Davies A, Cost GJ, Caputo E, Li J, Hieter P, and Boeke JD (1998). Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14, 115–132. . [DOI] [PubMed] [Google Scholar]
- 49.Eckersley-Maslin MA, Thybert D, Bergmann JH, Marioni JC, Flicek P, and Spector DL (2014). Random monoallelic gene expression increases upon embryonic stem cell differentiation. Dev Cell 28, 351–365. 10.1016/j.devcel.2014.01.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ribeiro-Dos-Santos AM, Hogan MS, Luther RD, Brosh R, and Maurano MT (2022). Genomic context sensitivity of insulator function. Genome Res, gr.276449.121. 10.1101/gr.276449.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, and Zhang F (2013). Genome engineering using the CRISPR-Cas9 system. Nat Protoc 8, 2281–2308. 10.1038/nprot.2013.143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wild J, Hradecna Z, and Szybalski W (2002). Conditionally amplifiable BACs: switching from single-copy to high-copy vectors and genomic clones. Genome Res 12, 1434–1444. 10.1101/gr.130502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zhao Y, Coelho C, Lauer S, Laurent JM, Brosh R, and Boeke JD (2022). Episomal editing of synthetic constructs in yeast using CRISPR. bioRxiv, 2022.06.21.496881. 10.1101/2022.06.21.496881. [DOI] [Google Scholar]
- 54.Xie Z-X, Mitchell LA, Liu H-M, Li B-Z, Liu D, Agmon N, Wu Y, Li X, Zhou X, Li B, et al. (2018). Rapid and Efficient CRISPR/Cas9-Based Mating-Type Switching of Saccharomyces cerevisiae. G3 (Bethesda) 8, 173–183. 10.1534/g3.117.300347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Billerbeck S, Brisbois J, Agmon N, Jimenez M, Temple J, Shen M, Boeke JD, and Cornish VW (2018). A scalable peptide-GPCR language for engineering multicellular communication. Nat Commun 9, 5057. 10.1038/s41467-018-07610-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gietz RD, and Schiestl RH (2007). High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat Protoc 2, 31–34. 10.1038/nprot.2007.13. [DOI] [PubMed] [Google Scholar]
- 57.Bolger AM, Lohse M, and Usadel B (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Li H, and Durbin R (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Faust GG, and Hall IM (2014). SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics 30, 2503–2505. 10.1093/bioinformatics/btu314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, et al. (2012). BEDOPS: high-performance genomic feature operations. Bioinformatics 28, 1919–1920. 10.1093/bioinformatics/bts277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Bonfield JK, Marshall J, Danecek P, Li H, Ohan V, Whitwham A, Keane T, and Davies RM (2021). HTSlib: C library for reading/writing high-throughput sequencing data. Gigascience 10, giab007. 10.1093/gigascience/giab007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, and Korbel JO (2012). DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339. 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, and Haussler D (2002). The human genome browser at UCSC. Genome Res 12, 996–1006. 10.1101/gr.229102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.R Core Team (2018). R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing; ). [Google Scholar]
- 65.Halow JM, Byron R, Hogan MS, Ordoñez R, Groudine M, Bender MA, Stamatoyannopoulos JA, and Maurano MT (2021). Tissue context determines the penetrance of regulatory DNA variation. Nat Commun 12, 2850. 10.1038/s41467-021-23139-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data have been deposited in the Gene Expression Omnibus (GEO) database, https://www.ncbi.nlm.nih.gov/geo (accession no. GSE206863). DNase-seq data were obtained from https://www.encodeproject.org for ES_CJ7 (ENCLB163SYJ, DS13320) and H7 human ESC (ENCLB449ZZZ, DS11909)7. CTCF ChIP-seq29 (GSM2259905), ChIP-seq data4,27,28 (GSM560343, GSM560345, GSM560350, GSM687280, GSM687282, GSM687285, GSM845236, GSM845238, GSM1082340, GSM1082341, GSM1082342), and ChIP-nexus30 for Zic3 (GSM4087824), Pbx (GSM4087823), Esrrb (GSM4087822), Sox2 (GSM4072777), Oct4 (GSM4072776), Nanog (GSM4072778), and Klf4 (GSM4072779), PRO-seq31 (GSE130691), and STARR-seq26 (GSM4261634) were obtained from the GEO repository. Gel electrophoresis images are available from Mendeley (doi: 10.17632/z7x3k943vz.1).
The processing pipelines for Capture-seq, ChIP-seq, and DNase-seq data are available on Github at https://github.com/mauranolab/dnase and has been deposited at Zenodo. DOIs are listed in the key resources table. All code for analyses herein is available upon request.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
REAGENT or RESOURCE | SOURCE | IDENTIFIER |
---|---|---|
Bacterial and virus strains | ||
Parental S. cerevisiae strain | Brachmann et al., 199848 | BY4741 |
Payload S. cerevisiae strain | This paper | Table S1 |
E. Coli TransforMax EPI300 | Lucigen | EC300150 |
Chemicals, peptides, and recombinant proteins | ||
0.1% gelatin | EMD Millipore | ES006-B |
2-mercaptoethanol | Sigma | M3148 |
BsaI-HF V2 | NEB | R3733S |
Buffer P1 | QIAGEN | 19051 |
Buffer P2 | QIAGEN | 19052 |
Buffer P3 | QIAGEN | 19053 |
CHIR99021 | R&D Systems | 4423 |
CopyControl Induction Solution | Lucigen | CCIS125 |
dATP | NEB | N0440S |
dNTPs mix | NEB | N0447L |
Esp3I | NEB | R0734S |
FBS | BenchMark | 100-106 |
ganciclovir | Sigma | PHR1593 |
GlutaMAX | ThermoFisher | 35050061 |
GoTaq Green | Promega | M7123 |
KAPA 2× Hi-Fi Hotstart Readymix | Roche | 07958935001 |
Klenow DNA polymerase | NEB | M0210L |
Klenow Fragment (3'-5' exo-) | NEB | M0212L |
KnockOut DMEM | ThermoFisher | 10829018 |
LIF | Sigma | ESG1107 |
MEM nonessential amino acids | ThermoFisher | 11140050 |
N2 Supplement | ThermoFisher | 17502048 |
nucleosides | EMD Millipore | ES008-D |
PD0325901 | Sigma | PZ0162 |
Pen-Strep | ThermoFisher | 15140122 |
proaerolysin | Aerohead Scientific | n/a |
puromycin | ThermoFisher | A1113803 |
Sera-Mag Magnetic Beads | Cytiva | 65152105050250 |
T4 DNA polymerase | NEB | M0203L |
T4 polynucleotide kinase | NEB | M0201L |
T4 Quick Ligase | NEB | M2200S |
Taq Polymerase | NEB | M0273L |
Critical commercial assays | ||
DNeasy Blood & Tissue kit | QIAGEN | 69506 |
dsDNA High Sensitivity Assay Kit | Invitrogen | Q32851 |
KAPA SYBR FAST | Kapa Biosystems | KK4610 |
Multiscribe High-Capacity cDNA Reverse Transcription Kit | ThermoFisher | 4368814 |
Nucleobond XtraBAC kit | Takara | 740436 |
Turbo DNA-free kit | ThermoFisher | AM1907 |
Zymo yeast miniprep I protocol | Zymo Research | D2001 |
ZymoPURE maxiprep kit | Zymo Research | D4203 |
Zyppy Plasmid Miniprep Kit | Zymo Research | D4020 |
Deposited data | ||
Sequencing data | This paper | GSE206863 |
Figure S1 gel images | This paper | doi:10.17632/z7x3k943vz.1 |
Experimental models: Cell lines | ||
Parental C57BL6/6J × CAST/EiJ (BL6xCAST) mESCs | Eckersley-Maslin et al., 201449 | Clone 4 |
C57BL6/6J × CAST/EiJ (BL6xCAST) Landing Pad and Payload mESC clones | This paper | Table S3 |
C57BL/6J (MK6) | NYU Langone Health Rodent Genetic Engineering Core | n/a |
Oligonucleotides | ||
Cloning primers | This paper | Tables S5, S7 |
Genotyping primers | This paper | Tables S10, S11 |
qRT-PCR primers | This paper | Table S18 |
Synthetic fragments | This paper | Table S9 |
Recombinant DNA | ||
Payloads | This paper | Table S1, Data S1 |
pCAG-iCre | Addgene | 89573 |
pCTC019 | This paper | n/a |
pLM1110 | Addgene | 168460 |
pLP-PIGA (pLP140) | Addgene | 168461 |
pLP-PIGA2 (pLP300) | Addgene | 168462 |
pLP-PIGA3 (pLP305) | Addgene | 196992 |
pMH005 | Ribeiro-Dos-Santos et al. 202250 | n/a |
pNA0304 | Addgene | 165612 |
pNA0308 | This paper | n/a |
pNA0519 | Zhao et al., 202253 | n/a |
pRB051 | Addgene | 196993 |
pSpCas9(BB)-2A-GFP | Addgene | 48138 |
pSpCas9(BB)-2A-Puro | Addgene | 62988 |
Software and algorithms | ||
Sequencing processing pipeline | This paper | doi: 10.5281/zenodo.7662273 |
BWA | Li and Durbin 200958 | https://github.com/lh3/bwa |
bcftools | Bonfield et al., 202161 | https://github.com/samtools/bcftools |
BEDOPS | Neph et al., 201260 | https://bedops.readthedocs.io/en/latest/ |
R | R Core Team 201864 | https://www.r-project.org |