Abstract
Regulatory elements are essential components of plant genomes that have shaped the domestication and improvement of modern crops. However, their identity, function, and diversity remain poorly characterized thus limiting our ability to harness their full power for agricultural advances using induced or natural variation. Here, we mapped transcription factor (TF) binding for 200 TFs from 30 families in two distinct maize inbred lines historically used in maize breeding. TF binding comparison revealed widespread differences between inbreds, driven largely by structural variation, that correlated with gene expression changes and explained complex QTLs such as Vgt1, an important determinant of flowering time, and DICE, an herbivore resistance enhancer. CRISPR-Cas9 editing of TF binding regions validated the function and structure of regulatory regions at various loci controlling plant architecture and biotic resistance. Our maize TF binding catalog identifies functional regulatory regions and enables collective and comparative analysis, highlighting its value for agricultural improvement.
Introduction
Transcription factors (TFs) bind short, sequence-specific motifs present in cis-regulatory regions thereby controlling when and where genes are expressed1. Natural or induced modifications to TF binding sites within these regions can produce beneficial plant phenotypes by altering target gene expression (e.g. pattern, level, condition)2–7. A better understanding of the role of TFs, their binding site diversity, and the features necessary to modulate gene expression has the potential to unlock new avenues for crop breeding and bioengineering, allowing for the generation of new varieties that address pressing food security needs8.
Cultivation and modern breeding of maize have resulted in the generation of genetically and phenotypically diverse inbred lines that reflect the high degree of nucleotide diversity present in wild cultivars, allowing maize to dominate a wider global area than any other crop9,10 and providing an excellent system for the study of genotype to phenotype relationships. Despite the general belief that differences in TF binding could be significant drivers of phenotypic variation, little is known about their composition, architecture, and divergence across inbreds7,11. The identification and characterization of these regions would allow the construction of a pan-genomic TF binding regulatory space that would assist breeding efforts in maize and other crops.
Here we empirically mapped TF binding events for over 200 TFs in two distinct maize inbred lines. Using this large-scale data, we identified sub-clade specific binding properties and heterodimeric combinations within various TF families and connected agriculturally relevant GWAS variants with potential TF regulators. Cross-genome analysis revealed many conserved TF binding events present in commonly used maize breeding stocks, pinpointing both proximal and distal functional regulatory regions with single basepair accuracy. We also found evidence for extensive TF binding variation across genomes, identifying high resolution genotype-specific TF binding events and positional variants in which TF binding events relocate over 10kb or greater distances. These differences were associated with changes in target gene expression across the genotypes, providing supporting evidence for how TF binding diversity shapes phenotypes. Furthermore, using CRISPR-based editing of TF binding sites, we show that our regulatory maps identify functional regions and can therefore be used to rationally engineer gene expression and phenotype.
Results
Large scale mapping of TF binding sites in maize
To comprehensively map DNA-binding events across multiple maize inbreds we selected TFs from thirty-six DNA-binding TF families for DNA affinity purification-sequencing (DAP-seq)12 experiments using genomic DNA from two inbred lines that have been widely used in maize breeding: the maize reference genome B73 and the commonly studied inbred Mo17 (Fig.1a). We obtained high quality DAP-seq data for 200 B73 TF clones in each genome, providing a total of 400 TF binding datasets comprising thirty different TF families (Extended Data Fig.1a). Reads were mapped to their respective genomes and a broad range of peaks and putative target genes were identified for each TF (Extended Data Fig.1b,c; Supplementary Table 1). Due to the in vitro nature of DAP-seq, these peaks represent all potential binding events regardless of tissue type or condition12,13. B73 and Mo17 genomic datasets reported nearly identical motif enrichments (Supplementary Table 1), supporting the reproducible nature of our results. Furthermore, we noted that most B73 TFs (86%) shared greater than 95% amino acid similarity with their Mo17 syntelogs (Supplementary Table 1), indicating that the B73 TFs used in our experiments are likely to be accurate predictors of binding for Mo17 syntelogs, as shown for Arabidopsis and rice orthologs which showed high signal correlation despite containing only 52% amino acid identity14. Together these datasets serve as a valuable resource for empirical DNA binding across the B73 and Mo17 genomes, identifying putative target genes, motifs and exact chromosome coordinates for thousands of binding events (http://hlab.bio.nyu.edu/projects/zm_crm/).
Fig. 1. Large scale DAP-seq profiling of maize TFs provides high quality genome-wide binding site data for B73v5 and Mo17.
a, Overview of TF binding site mapping and comparison approach b, Phylogenetic tree (based on bZIP amino acid sequences) showing subclade specificity of maize bZIP protein target sites. Loci shown include known targets of maize bZIPs or their Arabidopsis homologs. c, Heatmap of -log10 enrichment p-values computed by GARFIELD for overlaps between DAP-seq TF binding sites and trait-associated SNPs from the maize NAM GWAS panel (GWAS significance threshold 1e-6); only TFs that rank within the three most significant (enrichment p-value<=0.001) for at least one trait are shown.
Biological relevance of large-scale DAP-seq data
Prior to performing cross-genome comparative analysis, we first sought to mine the biological relevance of our large-scale mapping data within a single genome using our B73 datasets as a benchmark. Correlation analysis of binding events among B73 TFs showed that TFs from the same structural family often bound sites that were largely distinct from those of other TF families, likely reflecting the stringent motif specificity conferred by the various structurally unique DNA-binding domains and suggesting a high degree of intra-family similarity (Extended Data Fig.1d; e.g. compare TCP and WRKY groups). Similar results were observed for target genes (Supplementary Fig. 1). In certain cases, however, we also observed partial subclade-specific binding among members from the same structural TF family, indicating flexibility within certain DNA-binding domains and/or the influence of ancillary domains on DNA binding. For example, among the 41 bZIPs sampled, we noted many clustered according to phylogenetic subclades such as subclade-D and -I (Extended Data Fig.1d). This subclade specificity was also reflected at individual loci where certain bZIP subclade members showed preferential binding. For example, a total of six subclade-C members including BZIP1/O2, preferentially bound BZIP17, a known target of BZIP1/O215, but showed little binding at other loci that were instead bound by bZIPs from other subclades (Fig.1b). In many cases, preferential subclade binding events corresponded to differences in the core motif (i.e. clade I and D), although not always (i.e. clade G and C), suggesting that sequences flanking the motif could influence binding strength (Fig.1b). Additional examples of preferential subclade binding are shown in Supplementary Figs. 2 and 3. Together, these data highlight the importance of empirical binding studies to tease apart differences even among members from the same structural family. GO analysis of putative target genes of the various maize TFs allowed additional functional dissection, identifying unique GO processes for different TFs, e.g. HSF24 targets for heat stress and NAC74 and others for immune response (Extended Data Fig. 2a).
To further apply our B73 TF binding site data, we also assessed the overlap of peaks from our B73 datasets with GWAS SNPs from 41 measured traits segregating within the NAM population that captures the diversity of cultivated maize,16 seeking to connect GWAS variants with TF regulators. Significant enrichment was found for twenty-two traits between trait-associated SNPs and DAP-seq peaks of at least one TF (p<=0.001; Fig.1c). These included several GRFs and bZIPs associated with leaf angle (an important trait for light capture) (red rectangles, Fig.1c) as well as enrichment of SBP30/UB3 for architecture-related traits and MADS73 for flowering time-related traits (Extended Data Fig. 2b,c), among others. Enrichments were also observed with a natural association population (the Wisconsin Diversity Panel/ 282 Maize Association Panel; Extended Data Fig. 2b,c, Supplementary Fig. 4)17. Collectively, these analyses, benchmarked using our B73 data, confirmed the functional relevance of our DAP-seq datasets as a valuable resource to understand TF function and binding site specificity for agricultural gains.
Cooperative regulatory potential among TF binding events
The large number of TFs profiled in our DAP-seq experiments facilitated analysis of collective binding events for TFs from different families at individual loci. We observed that diverse maize TFs often bound in clusters, forming distinct cis-regulatory islands, as noted in other studies18–20. To better characterize these clusters, we first selected a subset of TFs from our B73 DAP-seq collection that captured the majority of binding site and motif diversity (Fig.2a, Extended Data Fig.1d, Extended Data Fig. 3a–c). This minimized intra-family binding site overlap and highlighted clusters of non-redundant binding peaks. We hereafter refer to the resulting panel of 66 representative TFs as the ‘B73-TF diversity panel’ (Supplementary Table 2).
Fig. 2. Maize TF diversity panel reveals functional cis-regulatory modules.
a, Maize TF diversity panel is represented by mostly distinct motifs. b, Bar graph showing the number of DAP-CRMs and their prevalence in gene features. c, Overlap of DAP-CRMs with orthogonal functional datasets. ACR, accessible chromatin regions from leaf and ear ATAC-seq; MOA, MOA-seq; UMR, unmethylated regions; CNS, conserved non-coding sequences from sorghum. d, Genome browser screenshot of TF binding peaks within the GRASSY TILLERS1 locus which contains eight distinct DAP-CRMs, each with three or more TFs. The estimated region of the prolificacy QTL is shown. e, Heatmap of -log10 enrichment p-values computed by GARFIELD for overlaps between DAP-CRMs consisting of specific TF combinations and maize NAM GWAS trait-associated SNPs for selected traits (GWAS significance threshold of 1e-7).
We next determined the overlap of peaks within the panel. We selected regions containing three or more peaks whose summits were within 300-bp, resulting in the identification of 225,235 DAP-cis-regulatory modules (DAP-CRMs) to which each was assigned a unique tracking identifier (Fig.2b). On average, DAP-CRMs were ~350-bp and contained five TF binding events (Extended Data Fig.3d,e). In total, they covered 3.6% of the maize genome, were located both proximal and distal to genes (Fig.2b, Extended Data Fig.3f), and overlapped with other independently identified functional regulatory regions such as accessible chromatin regions (Fig.2c)21–24. Among the distal DAP-CRMs identified in our analysis, several overlapped genetically defined QTL such as GRASSY TILLERS1 and Vgt1 providing crucial information on which TFs may be controlling these genes (Fig.2d, Extended Data Fig. 4a)25,26. This approach also allowed us to define novel distal regulatory regions (Extended Data Fig.4b,c). These findings support the functionality of our DAP-CRMs and provide much needed information about which TFs empirically bind in chromatin-delineated and other functionally established regions. Interestingly, 61% of ear and leaf ATAC-seq peaks overlapped with a DAP-CRM, suggesting that our TF binding site analysis may not yet be saturated.
We next examined whether certain combinations of TFs were consistently localized together within DAP-CRMs, identifying 15,214 combinations of three TFs that bound to at least one DAP-CRM region (Supplementary Table 3). We selected DAP-CRMs that contained at least one peak for ARF, SBP, or NAC - TFs known for their roles in plant architecture - and assessed whether these three-TF DAP-CRMs were significantly enriched at NAM GWAS loci associated with vegetative and inflorescence architecture, compared to DAP-CRMs that do not overlap with these loci27. This analysis showed that in certain cases co-binding by specific TF combinations was more influential than individual TF binding. For example, while ARF16 (activator), ARF25 (repressor), and SBP30/UB3 binding sites individually had minor enrichment for GWAS hits, the DAP-CRMs containing combinations of ARFs, SBPs, and NACs were much more enriched than the individual TFs (Fig.2e). Furthermore, different combinations of TFs showed enrichment for different traits. For example, ARF16-containing DAP-CRMs that also had SBP30 and NAC42 sites were enriched for SNPs associated with leaf angle (Cluster 1 and 3), while ARF25-containing DAP-CRMs that had SBP30 and NAC42 sites were instead enriched for cob length (Cluster 4 and 5) (Fig.2e). These data support a model where clusters of specific sets of TFs influence specific traits.
Comparative TF binding across diverse inbred lines
The maize inbred lines B73 and Mo17 harbor well-established phenotypic differences including cold tolerance, disease susceptibility, flowering time, and plant architecture28–32. Prior genome assembly comparison reported large structural variations (SVs; i.e. indels >50 bp), SNPs, and indels (<50bp)33. To better understand empirically how such sequence features influence TF binding and how variation among TF binding sites impacts phenotype, we compared TF binding peaks in B73 and Mo17 using DAP-seq data from our 200 TFs (Supplementary Table 1). We mapped reads to their tested genome, equalized read and peak numbers across genomes, and then performed coordinate ‘liftover’ using a high coverage whole genome syntenic alignment generated by Anchorwave34,35. This approach converted genomic coordinates of Mo17 to those of B73 (and vice versa), allowing direct comparison of peak coordinates. Comparison of mapped reads for direct and lifted datasets showed the highest degree of similarity among identical TFs confirming the accuracy of our approach (Extended Data Fig.5a). Furthermore, very few peaks overlapped with non-aligned regions suggesting we were capturing a large percentage of possible binding variation (Extended Data Fig.5b). Overall, comparison of peak coordinates revealed that between 7–88% of peaks within each TF dataset were specific to either B73 or Mo17 (average 37%), while 12–93% of peaks within each TF dataset were shared between both genomes (average 63% shared; Fig.3a). No difference in peak quality was observed between shared and B73-specific peaks (Extended Data Fig.5c). TFs that showed the highest degree of shared peaks were those that had a high percentage of exonic binding (i.e. EREB183), while those with low percentages bound more frequently to distal intergenic regions (i.e. MYBR52; Extended Data Fig.6a,b).
Fig. 3. Genotype-specific peaks are prevalent in B73 and Mo17 and explain genetically defined QTL.
a, Boxplots showing percentages of genotype-specific and shared peaks in B73 and Mo17. b, Percentages of B73-specific (dark blue) and shared (light blue) peaks that overlap with B73-Mo17 variants assigned as duplicated regions (DUPs), indels (INDELs), SNPs, and structural variants (SVs). For a and b, data points correspond to individual TFs, n=200 TFs; whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. c, Comparative genomic view of Vgt1-RAP2.7 locus showing three MADS69 peaks (light blue rectangles) upstream of RAP2.7 in B73v5, one of which is in the genetically defined Vgt1 enhancer. Lower left panel shows a close-up of the region in Mo17 containing the MITE and the corresponding region in B73v5 showing the MADS69 peak. d, Alignment of the MADS69 CArG-box motif with B73 and Mo17 sequences. The MITE insertion disrupts high information content nucleotides within the motif. e, Heatmap of -log10 enrichment p-values computed by GARFIELD for association between B73-specific peaks (stratified by genomic variant categories) and trait-associated SNPs from the maize NAM GWAS panel (GWAS significance threshold 1e-6). f, Boxplots showing motif score differences for B73-specific peaks for diversity panel TFs (n = 68). Whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. g, Comparative genomic view showing a 12bp indel at the ARF15 locus that causes ARF binding in B73 but not Mo17. h, Heatmap of -log10 FDR-adjusted p-values from one-sided Fisher’s exact tests, evaluating whether B73 vs. Mo17 differentially expressed genes across tissues are enriched among putative target genes associated with B73-specific DAP-seq peaks relative to putative targets near peaks shared between the two genotypes. Each column of the heatmap corresponds to target genes of one TF.
We next measured the degree to which Mo17 SNPs, indels, and SVs impacted TF binding for each TF dataset. In general, we noted that shared and B73-specific peaks overlapped with Mo17 SNPs and indels at roughly similar frequencies (Fig. 3b). By contrast, B73-specific peaks showed a much higher percentage of their peaks that overlapped with Mo17 SVs relative to shared peaks (Fig. 3b). These findings suggest that SVs are the biggest contributor to TF binding site variation across genotypes and together with SNPs and indels, could potentially serve as drivers of phenotypic variation. A large portion of peaks within SVs overlapped transposons (Extended Data Fig. 5d), illuminating numerous avenues by which these elements could alter gene expression. For example, we noted three large MADS69 peaks located upstream of the flowering time regulator ZmRAP2.7 in B73 that were absent in Mo17 (light blue highlighted regions in Fig.3c). Two of the MADS69 peaks overlapped Copia transposons present within a large B73-specific insertion proximal to ZmRAP2.7 while a third was located ~70-kb upstream in the B73 Vgt1 enhancer region, a genetically defined flowering time QTL26,29. Strikingly, the absence of the Vgt1 MADS69 peak in Mo17 appeared to be caused by the insertion of a 140-bp MITE transposon that has previously been strongly linked with earlier flowering time in Mo17 and other inbreds, but for which the underlying molecular cause has remained elusive. Our data indicate that a CArG-box binding site of MADS69 in B73, is bisected in Mo17 by the MITE, but that additional TF binding events within the Vgt1 enhancer remain intact (Fig.3c,d). Furthermore, we noted that MADS69 bound equally well at other loci in both B73 and Mo17, indicating the absence of binding in Vgt1 in Mo17 was not due to technical issues (Extended Data Fig. 6c,d). As both MADS69 and ZmRAP2.7 are known regulators of flowering time that are believed to be acting in the same pathway36, our data could pinpoint the underlying molecular details of an important flowering time haplotype in maize and guide the precise engineering of beneficial alleles.
To further support the relationship between SV-associated TF binding variation and plant phenotypes, we examined the overlap of GWAS trait-associated SNPs and B73-specific peaks. We found that B73-specific peaks associated with SVs were enriched in various GWAS traits (Fig. 3e, Supplementary Fig. 5–6). For example, a MATE efflux transporter associated with a GWAS hit for increased tocopherol (vitamin E) content37, showed B73-specific binding sites in the proximal promoter that could be responsible for the ~200-fold increase in ear RNA expression seen in B73 relative to Mo17 (Extended Data Fig.6e). Overall, these findings illustrate how SV-associated TF binding variation can shape plant phenotypes.
While SVs were the largest driver of inbred-specific peaks, we also explored the extent to which SNPs affected TF binding. We computed the differences in motif scores between the B73-specific peak sequences and their syntenic Mo17 sequences, revealing that B73-specific peak sequences were more likely to have higher motif scores than syntenic Mo17 sequences that did not have a peak (Fig.3f). Additionally, among the peaks that overlapped with SNPs, we found that a slightly lower average percentage of B73-specific peaks overlapped SNPs relative to shared peaks (Fig.3b, quantified as overlap with one or more SNP). However, this trend inverted when we counted only peaks with greater than four SNPs and was also influenced by SNP location, showing that shared peaks were slightly more sensitive to SNPs present near the peak summit compared to SNPs present at peak edges (Extended Data Fig.6f). Taken together, these results suggest that TF binding is sensitive to SNPs residing in the sequence motif as well as the overall number of SNPs in the peak. Small indels also affected TF binding, often in unexpected ways. For example, several activator ARF peaks were observed downstream of ARF15 in B73 but were absent in Mo17 due to a 12bp insertion in B73 that resulted in four direct repeats of the core ARF binding site TGTC/GACA, producing an optimal motif spacing pattern preferred by certain ARFs (Fig.3g)38. As observed for SV-associated peaks, we also noted that peaks overlapping SNPs, indels, and duplicated regions were enriched for various GWAS traits (Fig. 3e, Supplementary Fig. 5–6). Overall, these data demonstrate the power of our comparative TF mapping to identify specific TF binding sites that likely contribute to phenotypic variation.
To investigate if additional genotype-specific peaks were associated with previously identified functional regions, we calculated their overlap with published orthogonal datasets often used in identifying enhancer regions: UMRs, ACRs, MOA-seq and CNS from sorghum21–24. We found that like shared peaks, B73-specific peaks showed between three-fold and eight-fold higher overlap with orthogonal datasets compared to randomly shuffled peaks (p<0.0001, paired t-test; Extended Data Fig. 6g). The relatively high percentages of B73-specific peaks that overlapped orthogonal functional data suggests many serve an important role. We also tested how many of the peaks in each category overlapped with our DAP-CRMs. An average of ~64% of top B73-specific peaks overlapped a DAP-CRM, suggesting that genotype-specific peaks reside near other binding sites, similar to shared peaks (average 70%; Extended Data Fig. 6g). These data show that putative enhancers and their TF composition are often dynamic.
While genotype-specific binding sites are likely drivers of phenotypic differences, shared TF binding sites are instead likely to underlie core processes and developmental programs. We noted many instances of conserved TF binding sites that had similar composition and spacing in B73 and Mo17 and appeared positionally constrained (Extended Data Fig. 7a). In contrast, we also observed other shared TF binding sites that we classified as positional variants (PosVs), in which TF binding events were present in both B73 and Mo17 but located at variable distances from the TSS of their common target gene. For example, the promoter of ZmATL34 (Zm00001eb193860), which encodes an E3 ubiquitin ligase whose Arabidopsis homolog is involved in defense and carbon/nitrogen response39, showed proximal binding by BZIP111 and NAC42 in both B73 and Mo17, however additional conserved binding events (BZIP111, HSF24, NAC3, NAC42, SBP30, and THX26) were relocated 11-kb upstream in Mo17 due to a sequence insertion (Extended Data Fig. 7b). We noted that ~5–25% of shared peaks (average 17%) that resided in upstream regulatory regions (i.e. 5’UTR and −10-kb promoter) were relocated 500-bp or more from their position relative to the TSS in the other genotype (Extended Data Fig. 7c). This percentage was substantially lower than shared peaks that were relocated less than 500-bp from the TSS in either genotype and those that were unique to a specific genotype (Extended Data Fig.7c) but could still be an important source of expression variation given that in many cases peaks relocated greater than 10-kb from their presumptive target gene.
Genotype-specific peaks drive differential gene expression
Having noted a substantial number of TF binding differences (both presence/absence and PosVs), we next asked how many of these were associated with expression differences in target genes. We found that significantly greater proportions of genes associated with B73-specific peaks were differentially expressed between B73 and Mo17 (FDR adjusted p< 0.05, fold change > 2)40 compared to the genes associated with shared peaks, indicating that genotype-specific TF binding events can impact gene expression (Fig.3h, Supplementary Fig. 7). A similar effect on gene expression was observed for PosVs as well; for many TFs, a significantly higher percentage of target genes were differentially expressed between B73 and Mo17 when their associated peaks were relocated more than 500-bp, compared to those genes for which the associated peaks were relocated less than 500-bp (Extended Data Fig.7d). This suggests that variation in binding site presence/absence and location can impact gene expression.
We were also interested in whether TF binding events were drivers of cell-type specificity. To investigate this, we used cell type-specific root expression data41 to examine the overlap with our B73-specific and B73-Mo17 shared target genes. Interestingly, we found no significant difference in the proportions of cell type-specific genes between shared and B73-specific targets (Extended Data Fig.7e). This data agrees with reports showing that TF-target gene pairs are also often conserved across species41. Together, these data demonstrate that TF binding variability is associated with gene expression changes between genotypes while cell-type specificity is less likely to be influenced by genotype-specific TF binding events.
Impact of TF heterodimerization on DNA-binding
All TF binding data presented here so far has involved binding by single TFs or clusters of TFs, however protein-protein interactions among TFs have the potential to alter binding site specificity and thus expand the repertoire of TF binding sites. To explore how combinatorial heterodimeric TF interactions impact DNA binding specificity we used doubleDAP-seq, a modified version of DAP-seq42 (Extended Data Fig.8a), to compare binding among certain interacting members of the bHLH family. Nearly all bHLH members contain highly conserved H-E-R amino acid residues in their ‘basic’ region which have been shown to directly contact DNA, with most members typically binding to E-box (CANNTG) or G-box (CACGTG) sequences43,44. In plants, one exception is the group VIII bHLHs which include the Arabidopsis HECATE (HEC) genes involved in meristem and female reproductive organ development45,46 and the maize BARREN STALK1 (BA1) gene involved in inflorescence architecture47. A total of nineteen members comprises this subclade in B73 (Extended Data Fig.8b), with all containing conserved Q-A-R residues, a feature that is broadly shared among plant homologs (Fig.4a). The absence of critical DNA-contacting residues in subclade VIII bHLHs suggests they may not directly bind DNA but could instead modify the DNA binding properties of other interacting bHLHs. Alpha-fold based prediction48 revealed that the Q-A-R ‘basic’ region forms a helix resembling that of the canonical ‘basic’ bHLH domain, albeit one that is shorter by two helical turns (Fig.4b, Extended Data Fig.8c). Surprisingly, of the nine Q-A-R members tested in our single-TF DAP-seq assay, eight yielded a moderate number of peaks in both B73 and Mo17 that were enriched for a similar, but non-canonical motif highly divergent from the E-box (Fig.4c, Supplementary Table 1). These findings suggest that Q-A-R type bHLHs bind DNA despite lacking canonical DNA-contacting residues.
Fig. 4. DoubleDAP-seq analysis of Q-A-R type bHLHs.
a, Clustal alignment of amino acids in bHLH DNA binding domain of Q-A-R bHLHs (light green; group VIII) and group VIIb bHLHs (light blue) with QAR and HER residues shown. b, Ribbon diagrams of AlphaFold predicted structures of BHLH85 (Q-A-R), BHLH125 (H-E-R), and the heterodimer. Empirically determined DAP-seq motifs are shown below each structure. c, Number of peaks and top motifs called for DAP-seq and doubleDAP-seq datasets where TF:TF indicates HALO-bHLHVIII:SBP-Tag-bHLHVIIb. d, Pearson correlation of genome-wide binding events of the homo- and heterodimers. e, Genome browser screenshots of binding by homo- and heterodimers. f, Summary of protein-protein interactions based on DAP-seq experiments.
Previous studies have indicated that Arabidopsis HECs (subclade VIII with Q-A-R residues) physically interact with SPATULA (SPT), a subclade VIIb bHLH that contains canonical H-E-R ‘basic’ residues49. Maize contains two SPT orthologs (BHLH125/ZmSPT1 and BHLH168/ZmSPT2) that are expressed in multiple tissues (Extended Data Fig.8d), and that when tested in the single-TF DAP-seq assay, bound a moderate number of peaks that were enriched for the canonical G-box motif CACGTG (Fig.4b). Surprisingly, testing these bHLHs in doubleDAP-seq revealed that heterodimers formed by ZmSPTs and the subclade VIII Q-A-R members yielded peak numbers up to 44 times higher than the homodimers (Fig.4c). Furthermore, a highly enriched, unique motif (CCCATnCC) was identified in sites bound by the cross-clade heterodimers (Fig.4b) leading to largely distinct binding locations for heterodimers and homodimers (Fig.4d,e), despite expectations that both sets of binding sites might be pulled down in doubleDAP. The CCAT half-site bound by the heterodimer resembled the CCAG half-site bound by ZmSPTs, while the other half site resembled that of the subclade VIII bHLHs (Fig.4b,c). Identical results were observed in both B73 and Mo17 (Supplementary Table 1 and 4). These results indicate that the plant-specific Q-A-R type DNA binding domain binds DNA both as a homodimer and as a heterodimer with SPT-related bHLHs (Fig.4f), thereby expanding the known bHLH binding repertoire and potentially explaining the dual roles that have been proposed for HEC genes45.
Induced TF binding variation impacts agronomic traits
While natural variation is an important source of phenotypic variability for plant breeding, genome editing technologies have the potential to vastly expand the pool of available alleles beyond those captured by nature3. Further studies are needed however to understand the effects of cis-regulatory region editing50. Therefore, we performed CRISPR-based genome editing of TF binding sites within DAP-CRMs at several maize loci, including sites located in upstream regulatory regions, a 3’ UTR, and distal enhancers. In most cases, deletion of TF binding sites led to altered expression and/or phenotypes and allowed us to pinpoint specific groups of TFs affecting phenotypic output. For example, deletion of a DAP-CRM region containing at least five TF binding events including an SBP30/UB3 and SBP2/TSH4 site in the upstream non-coding region of the TASSELSHEATH1 (TSH1) gene, caused three to four-fold lower TSH1 RNA expression and a mutant phenotype that resembled loss-of-function tsh1 mutations (including reduced tassel branching and extended outgrowth of bracts, which are typically suppressed in wildtype B73 plants; Extended Data Fig.9a–e)51. Similarly, editing of three TGTC motifs within an ARF peak located in the highly conserved 3’UTR of BIF238, the maize homolog of Arabidopsis PINOID, involved in the regulation of auxin transport, resulted in ears with defective axillary meristem initiation reminiscent of previously described bif2 coding region mutants (Extended Data Fig.9f)52.
To next investigate how induced cis-regulatory variation at distal regulatory regions influences phenotype we performed CRISPR-based genome editing on the long-range DICE enhancer, known to influence herbivore resistance53. The DICE enhancer lies 143-kb upstream of the BX1 gene, which encodes the first enzyme in the production of the herbivore resistance compound DIMBOA and control its expression54. Previous data has shown that Mo17, which has higher levels of BX1 mRNA and higher levels of DIMBOA relative to B73, contains partially duplicated sequence within the genetically mapped DICE region (Fig.5b)53. DAP-seq data near this region revealed the presence of two large, conserved DAP-CRMs in B73 and Mo17 (pink boxes in Fig.5a), and an additional DAP-CRM in Mo17 (purple region in Fig.5a). The additional Mo17 DAP-CRM (CRM119799) contained binding sites for twelve TFs and overlapped with a 3.4-kb Mo17-specific insertion. Interestingly, our data revealed that the Mo17-specific DAP-CRM contained a similar combination of TFs to that seen in syntenic B73-Mo17-CRM119798, suggesting that the increased BX1 expression seen in Mo17 was caused by a tandem enhancer duplication consisting of at least nine TF binding sites (Fig.5a, Extended Data Fig.10).
Fig. 5. CRISPR induced cis-regulatory variation drives expression and phenotypic differences.
a, JBrowse2 genome browser screenshot of ~10kb region surrounding the DICE enhancer in Mo17 and B73. Two conserved DAP-CRMs (pink highlighted area) and one Mo17-specific CRM (purple highlighted area) that appears to be a partial segmental duplication of the upstream CRM and binding sites (CRM119798) are evident. b, RNA-seq data from 11-day old seedlings from Zhou et al., 2019 showing expression levels of various BX genes located near the DICE enhancer. Gene order is same as on chromosome. Mo17 shows 51-fold greater levels of BX1 expression (yellow box) relative to B73. c, CRISPR editing of Mo17 sequences using multiplexed guides near the DICE enhancer revealed specific TF binding sites important for BX1 expression. Relative expression of BX1 determined by qRT-PCR for six independent alleles is shown on right. Data are presented as mean values of three biological replicates relative to wildtype siblings +/− SD; data points correspond to individual biological replicates. ** indicates significant difference relative to WT sibling in two-sided t-test. d, Schematic depicting individual enhancer components that contribute to enhanced expression of BX1 in Mo17.
We then used three independent CRISPR constructs to generate an allelic deletion series targeting various DAP-seq peaks within these CRMs in Mo17 (Fig.5c). BX1 expression in the various alleles was measured relative to Mo17-introgressed sibling controls using qRT-PCR in young seedlings. A large 4.1-kb deletion that removed 16 TF binding sites within CRM119798 and CRM119799 caused a 9-fold reduction in BX1 expression (allele 651-1), while alleles with smaller deletions had only minor or non-significant reductions in expression levels (Fig.5c). The presence of ARF16, GLK17, and BZIP91 sites may be particularly important as their deletion from both CRMs in allele 651-1 caused strong reduction in BX1 expression relative to other alleles. Small deletions in CRM119801 also resulted in altered BX1 expression, with one allele (345-1) showing a substantial reduction in BX1 levels relative to wildtype Mo17-introgressed siblings, suggesting that this DAP-CRM may also impact the expression of BX1. Deletion of ARF, SBP, and EREB126/RAV1 binding sites were common to both the 345-1 and 651-1 strong alleles, highlighting a potentially important role for these TFs. Taken together, these results demonstrate that the underlying cause of the DICE enhancer in Mo17 is likely due to a distal tandem enhancer containing at least nine unique TF binding sites that boost BX1 gene expression (Fig.5d). An additional CRM downstream of DICE may also substantially contribute to BX1 expression.
Discussion
Numerous strategies have recently been used for mapping functional non-coding regulatory regions in plants, each with different advantages18,55–61. Here, we engaged a TF-centric approach to annotate the maize non-coding space, taking advantage of several aspects of DAP-seq including the power to profile large numbers of TFs in multiple genomes and the ability to assess TF-DNA binding in the absence (or presence) of partner TFs without interference from other in vivo factors12. Integrating this information with existing non-coding annotations allowed us to identify individual and composite TF binding sites within proximal and distal regions that overlapped with GWAS SNPs and could be used for further functional analysis and targeted breeding. In addition, using DAP-seq we observed direct TF binding by a plant-specific subclade of the bHLH family previously not known to bind DNA, while doubleDAP-seq allowed us to tease apart differences in DNA binding specificities for different bHLH dimers (Fig.5). Given the large number of TFs known to heterodimerize in protein-protein interaction assays, our results imply that many more novel DNA binding preferences are yet to be revealed.
Our comparative analysis of TF binding in B73 and Mo17 revealed a strong degree (~63% average) of TF binding event conservation among two diverse inbred lines, implying evolutionary constraint likely related to key regulatory functions. Indeed, our data also showed conservation of individual TF binding-target gene pairs between maize and Arabidopsis (Fig.1b), as has been observed in other studies62. Additionally, we also observed widespread differences in TF binding across B73 and Mo17. While SVs were the largest contributor to these genotype-specific TF binding differences, impact from SNPs, indels and PosVs was also prevalent and associated with changes in gene expression. These findings demonstrate how certain TF binding events (such as the MADS69 binding site within the Vgt1 QTL, Fig.3c,d) can contribute to phenotypic diversity, although in some cases the presence of the SV itself may also play a role. Building upon the knowledge gained from existing natural variation, we also showed that similar changes could be induced via CRISPR-based editing of TF binding sites at additional loci of interest, thereby expanding possibilities for engineering desirable agronomic traits such as flowering time and herbivore resistance.
Methods
Genomic DNA library construction, DAP-seq, and DoubleDAP-seq
Genomic DNA libraries were constructed using DNA purified from the aerial portion of 14-day maize seedlings as follows63: 5ug of purified DNA was sheared to 200-bp using a Covaris S2 and cleaned with AmpureXP beads (Beckman-Coulter) at a 2:1 bead to DNA ratio. Samples were end-repaired using End-It (Lucigen), A-tailed with Klenow 3’−5’ exo- (NEB), and truncated Illumina adapters were added with T4 DNA ligase (NEB) overnight at 16 degrees C. Adapter-ligated libraries were cleaned with AmpureXP beads (1:1 ratio) and quantified with Qubit HS (Thermo-Fisher).
TF pENTR clones were mostly obtained from the maize TF clone collection64 and names are based on the Grassius nomenclature64,65. See Supplementary Table 1 for gene IDs. For DAP-seq experiments38,63, pENTR clones were LR recombined into the pIX-HALO::ccdB vector63 and 1ug of plasmid DNA was used for in vitro protein expression using the TNT rabbit reticulocyte system (Promega L4600) per the manufacturer’s instructions. In vitro protein reactions were incubated for 2 hours at 30 degrees C. HALO-TF protein was subsequently incubated with 10ug of MagneHALO beads (Promega) for 1 hour rotating at room temperature (RT). Beads were washed with 100ul of wash buffer (PBS and 0.005% NP40) three times prior to addition of 1ug of maize adapter-ligated genomic library diluted in wash buffer. Samples were rotated for 1 hour at RT. Unbound DNA was removed with six to eight washes of 100ul of wash buffer, and bound DNA was eluted in 30ul of Elution Buffer (EB, 10mM Tris) at 98 degrees C for 10min. Samples were placed on ice and DNA was recovered from the beads. Eluted DNA was PCR amplified for 19 cycles with dual indexed Illumina TruSeq primers. Samples were sequenced on a NextSeq550 (SE, 75bp reads), NovaSeq6000 (PE, 150bp reads), or HiSeqX (PE, 150bp reads). ARF samples tested with Mo17 DNA used recombinant GST-ARF proteins purified from E.coli38.
DoubleDAP-seq was carried out by simultaneously expressing pIX-HALO-TFs (subclade VIIIb, Q-A-R ) and pIX-SBPtag clones42 (subclade VIIa, ZmSPTs) in a 50ul TNT rabbit reticulocyte reaction containing 1ug of each plasmid in a 2 hour incubation at 30 degrees C. Subsequently 1ug of adapter-ligated library was diluted in wash buffer and added to the reticulocyte reaction together with 10ul of MagneHALO beads for a final volume 100ul. Samples were rotated for 2 hours at RT, washed eight to ten times in wash buffer using a magnet. Bound DNA was eluted with 30ul of EB at 98 degrees C for 10min. Sample enrichment and sequencing was performed as for single DAP experiments. Protein heterodimers of BHLH125 and BHLH85 were modeled using AlphaFold on Google CoLAb Pro48,66. Protein input sequences were the same as those used for DAP-seq clones.
The 200 TFs were selected based on the following criteria: 1) they belonged to a TF family for which Arabidopsis DAP-seq data indicated they were capable of binding DNA12, 2) they represented different sub-clades within the TF family, and 3) they were present in the Grassius TF clone collection. Additional factors included prior genetic characterization in maize or DAP-seq success with an Arabidopsis ortholog. We note that a small number of B73 DAP-seq datasets used in our comparative analysis were previously published as standalone datasets but are reanalyzed here for comparative purposes with Mo17.
Read mapping and blacklist construction
Reads were trimmed using Trimmomatic67 and mapped to either the B73v5 or Mo17 CAU1.0 genomes with bowtie2 using default parameters68. Mapped reads were filtered to retain reads with MAPQ greater than 30 using ‘samtools view -q 30’. Stringent criteria were established to exclude artifactual binding regions by generating a blacklist of non-specific peaks: sites bound in nearly all TF datasets and the negative control HALO-GST sample38 were manually curated for both B73v5 and Mo17 CAU1.0 (Supplementary Tables 5 and 6).
Peak calling analysis and target gene assignment
Peaks were called with GEM3 69 using an adjusted p-value of 0.00001 (--q 5). For datasets that produced few peaks at this threshold (i.e. <10,000), the default threshold of 0.01 (--q 2) was used. For datasets that exceeded 100,000 peaks at the q5 threshold, a q10 threshold (0.0000000001) was applied. The threshold used for each dataset is listed in Supplementary Table 1, along with average number of reads per peak and the percentage of peaks that contained the listed motif. Target genes were assigned to peaks using ChIPseeker and default assignment priorities70. High confidence target genes were only assigned to peaks that resided in promoters (defined as −10,000bp to +1bp relative to the TSS), UTRs, exons, and introns, and 300bp downstream of the transcription end site (TES). Gene features were annotated according to the Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3.gz annotation file downloaded from maizeGDB.
Motif analysis
The GEM events reported were ranked by q-value then by fold enrichment. Sequences for the 201-bp region centered at the top 1000 GEM events were extracted from the corresponding genome (B73v5 or Mo17 CAU1.0) and used in de novo motif discovery by MEME-ChIP (version 5.3.0) with the parameter “-meme-searchsize 0”71. For comparing motif scores between B73-specific peaks and syntenic Mo17 regions, aligned sequences between the two genomes were extracted from the Anchorwave.maf output file using the UCSC utility mafInRegions and scored against the PWM motifs reported by MEME-ChIP using MAGGIE72.
GWAS analysis
For GWAS analysis of the NAM lines, GWAS summary statistics computed by the NAM genomes project were downloaded from CyVerse Data Commons: /iplant/home/shared/NAM/NAM_genome_and_annotation_Jan2021_release/SUPPLEMENTAL_DATA/NAM-GWAS-PVE-files/SNPs. SNP data was downloaded from /iplant/home/shared/NAM/NAM_genome_and_annotation_Jan2021_release/SUPPLEMENTAL_DATA/NAM-SV-projected-V8. For GWAS analysis of the Wisconsin diversity panel, trait values and B73v5 variants were downloaded from supplementary data repositories of the respective publications17,73. GWAS calculation was conducted as described in Mural 202217, using the Fixed and random model Circulating Probability Unification (FarmCPU) algorithm74 in the rMVP package version 1.3.575, including both the kinship matrix and the first five principal components in the model.
Linkage disequilibrium (LD) was computed for only biallelic SNPs using PLINK (version 1.90b6.21) with the parameters ‘--make-founders --r2 dprime --ld-window-kb 100 --ld-window 100000 --ld-window-r2 0.2’. The GWAS functional enrichment tool GARFIELD27 was used to annotate LD-pruned SNPs (LD r2 > 0.1) by their overlap with DAP-seq peaks of each TF. Subsequently, the odds ratio and two-sided p-values of the overlap were computed for various GWAS significance levels in a logistic regression model, accounting for minor allele frequency, distance to the nearest TSS, and the number of LD proxies (r2 > 0.8).
TF diversity panel
This panel consisted of 66 TFs and TF combinations (i.e. trimers of NFY-A1, LEC1 (NFY-B), and CA5P16 (NFY-C), and dimers of BHLH67/BA1 and BHLH125/ZmSPT) from this study and several published DAP-seq datasets21,22,38,76–79. The list of TFs, gene IDs, and family/sub-clade classification are listed in Supplementary Table 2. Only datasets that showed unique binding motifs, belonged to a distinct subfamily, and/or had a Pearson correlation less than 0.5 were included (indicative of relatively unique binding; Extended Data Fig. 3a). TF selection was done a priori to downstream analysis. When possible, TFs that gave high quality peaks in both B73 and Mo17 and/or were genetically characterized were selected. The panel contained at least one TF from each of the 30 families, plus TFs that showed unique sub-clade binding (Extended Data Fig. 1d). We note that TFs excluded from this list often exhibited several putative unique binding sites and/or showed unique expression patterns, suggesting they play a non-redundant role in maize development. For example, while many SBP TFs showed similar motifs and had highly correlated genome-wide binding profiles, their individual RNA expression patterns often showed tissue specificity which could be a large driver of non-redundant functionality (Extended Data Fig.3b,c). For this reason, we recommend that diversity panel TFs should be interpreted as representative members of the TF family or sub-family, and follow-up investigation should consider tissue specificity and related family members.
DAP-CRM construction
DAP-CRMs were constructed by combining all peak summits of TFs from the diversity panel and merging any summits within 300bp using the bedtools v2.25.0 ‘merge’ utility. We then selected regions containing three or more TF summits using the following code: cat *.TFdiversityPanelTFs.narrowPeak | sort -k1,1 -k2,2n | bedtools merge -i stdin -d 300 -c 2 -o count | awk -F “\t” ‘ if($4 > 2) print $0 ’. DAP-CRMs were compared to randomly shuffled datasets and statistical analysis was performed using the bedtools v2.30 utility bedtools fisher. Peak calling thresholds are shown in Supplementary Table 1.
Whole genome alignment analysis and coordinate liftover
Whole genome alignment of the B73v5 and Mo17 genomes was performed using Anchorwave34. The .maf output file was converted to a .chain file using the maf-convert script of LAST version 129680. Coordinate liftover was performed using Crossmap35 and the above-described chain files. Visualization was performed using IGV (single genome view)81 and JBrowse282 (comparative view visualized using the chain files).
Normalization of TF diversity panel for determining genotype-specific peaks
BAM files of mapped reads were downsampled using samtools -s to adjust the number of mapped reads between Mo17 and B73 to the lower value of the two datasets for the same TF. This was done to mitigate issues arising from under- or over-sequencing of matched datasets. Peak calling was performed on downsampled datasets using GEM369 and the threshold was adjusted to achieve a similar number of peaks between Mo17 and B73. Peak coordinates were ‘lifted’ using CrossMap35. Pearson correlation between coordinate-lifted and reference peaks was determined using the deepTools2 multiBigWigSummary utility83. Overlap between CrossMap lifted peak coordinates and the reference was determined using the bedtools v2.25.0 intersect utility84. Shared and genotype-specific peaks were determined for each TF by taking the top 20% of peaks from one genotype and comparing them to the total peaks of the opposing genotype to reduce false positives and negatives resulting from peak thresholding. Therefore, comparative analysis does not consider quantitative differences in peak binding. No significant difference in average reads per peak or percent of peaks with motif was observed between shared and specific peaks. Positional variation (PosV) relative to the TSS was calculated as follows: B73v5 gene models in gff3 format were converted to Mo17 genome coordinates using Liftoff85 with default parameters. Putative target genes were assigned using ChIPseeker70 using either the B73v5 gene models (B73 datasets) or B73v5 gene models lifted to Mo17 (Mo17 datasets). This ensured differences in distance to TSS were not due to differences in gene model annotation. Differences in distance to TSS were then calculated for those shared peaks with the same assigned putative target gene. Only positional variants for B73 peaks were determined but similar percentages would be expected for Mo17 peaks.
To assess peak overlap with SNPs, indels, and structural variants, a genome-wide VCF file was generated using SyRI86. First, the Anchorwave.maf output file (B73v5 as reference, Mo17 as query) was converted to .sam format using the maf-convert script of LAST80. This file was then used with SyRI to generate the B73-Mo17 .VCF file: syri -c anchorwave_Mo17toB73v5.sam -F S --prefix anchorwave_Mo17toB73v5_sam_ --cigar -f --log DEBUG -r Zm-B73-REFERENCE-NAM-5.0.id_chrs_mg.fa -q Mo17_CAU-1/Zm-Mo17-REFERENCE-CAU-1.0.id_chr_nuc_mg.fa. The resulting VCF file was parsed to separate SNPs, small indels less than 50bp, structural variants (indels greater than 50bp), ‘not aligned’ sequence, and duplicated regions. The bedtools intersect utility from bedtools v2.25.084 was used to quantify overlap with variant features. Peaks and features were considered overlapping if their coordinates overlapped by >=1bp.
Differential gene expression correlation and root cell type-specific analysis
Raw RNA-seq reads for selected tissues in B73 and Mo17 were downloaded from NCBI SRA (PRJNA482146; Supplementary Table 7). Transcript quantification was done using the RSEM software package (version 1.3.3)87 with the STAR aligner (version 2.7.6a)88. RSEM references were built using the Mo17 and B73v5 genome sequences with annotations of genes mappable between Mo17 and B73. To create the Mo17 gene annotation, the B73v5 gene model (Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3) was mapped from the reference genome sequence Zm-B73-REFERENCE-NAM-5.0 to the target genome sequence Zm-Mo17-REFERENCE-CAU-1.0 using Liftoff (version v1.6.3)85. The resulting annotation file was used with the Mo17 genome sequence to build the Mo17 RSEM reference. To create the B73 gene annotation, genes in the B73v5 gene model were filtered to keep only genes that were lifted to Mo17. The resulting annotation file was used with the B73v5 genome sequence to build the B73 RSEM reference. Gene expression values were then calculated from the paired-end RNA-seq read files using the corresponding RSEM reference genomes. The RSEM results were imported into R by tximport (version 1.30.0)89 for differential expression analysis by DESeq2 (version 1.42.1). To remove genes with low counts in a majority of the samples, pre-filtering was performed by keeping genes that had read counts of at least 10 in a minimum number of samples, where the minimum was set to the lowest number of replicates for the two genotypes in each tissue (Supplementary Table 7). For each tissue, differential gene expression between B73 and Mo17 was computed using the standard DESeq analysis settings. Differentially expressed genes (DEGs) were identified as genes that had absolute log2 fold change greater than 1 and FDR adjusted P-value lower than 0.05.
Putative shared or B73-specific target genes were assigned by ChIPseeker70 based on genes that had shared or B73-specific peaks within −10kb to +300 bp from the TSS in B73 gene model Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3. To test whether B73 vs. Mo17 DEGs were enriched for genes associated with B73-specific DAP-seq peaks relative to genes associated with shared peaks, we constructed two-by-two contingency tables (DEG or non-DEG vs B73-specific or shared targets) for each TF in the diversity panel across all nine tissue types, which were used in one-sided Fisher’s exact tests by the fisher.test function in R. The reported P values were then adjusted for multiple hypothesis testing by the Benjamini–Hochberg procedure and transformed to -log10 scale for plotting by the R package ComplexHeatmap90,91. Row and column clustering were done using Euclidean distance and complete linkage methods.
For overlap with root cell type markers, the list of cell type-specific marker genes for maize was obtained from Supplementary Table 4 of Guillotin et al.41. The root cell-type specific markers for 19 clusters from single cell RNA-seq were converted from B73v4 gene model to v5 using the liftoff files provided by MaizeGDB (https://ars-usda.app.box.com/v/maizegdb-public/folder/165362280830). One-sided Fisher’s exact tests were performed to test whether the cell-type-specific marker genes were more enriched among shared target genes than among B73-specific target genes. The reported P values were adjusted for multiple hypothesis testing by the Benjamini–Hochberg procedure and transformed to -log10 scale for plotting by the R package ComplexHeatmap90,91 .
To determine whether positional variation (PosV) among shared promoter peaks of B73 and Mo17 was associated with differential gene expression, the PosV peaks for 65 diversity panel TFs were analyzed, wherein the DAP-seq TF targets were divided in two groups: PosVs greater than 500-bp and PosVs less than 500-bp. Note that this analysis used all shared peaks, not just the top 20% of peaks. Considering only the DAP-seq TF targets that were positionally variant between inbreds, two-by-two continency tables (a gene is DEG or not DEG vs. a target gene is associated with PosV greater than 500-bp or less than 500-bp) were created for each of the 65 TFs and seven tissues, which were used in one-sided Fisher’s exact tests by the fisher.test function in R. The reported P values were then adjusted for multiple hypothesis testing by the Benjamini–Hochberg procedure. Of the 65 TFs used in our PosV analysis, 44 showed a PosV effect. Four of these TFs (9%) were themselves differentially expressed in the same tissue where a PosV effect was observed. These observations suggest that the enrichment for DEGs associated with PosV peaks is not widely influenced by differences in the expression of the TFs themselves.
Orthogonal dataset overlap analysis
The MOA-seq data from ear tissue59 remapped to B73v5 was obtained from maizeGDB. ATAC-seq data from leaf and ear were obtained from Ricci et al.22 and Hufford et al.21, via maizeGDB. NAM consortium UMR data was obtained from maizeGDB21. Bed annotation files of B73v5 transposable elements were obtained from maizeGDB. The conserved non-coding sequence (CNS) regions from sorghum were obtained from Song et al.24 with B73v4 coordinates converted to B73v5 coordinates with the EnsemblPlants Assembly Converter tool. Overlap analysis was performed using the bedtools intersect (v2.25.0) tool84. Peaks were considered to overlap with a particular feature if their coordinates shared greater than or equal to 1-bp. Mo17 UMR and ATAC-seq peak data shown in JBrowse2 genome browser screenshots was from Noshay et al.92.
GO enrichment analysis
GO enrichment was performed using ShinyGO93. Analysis was performed by selecting top peaks (according to signal value) that were located in UTRs, putative promoters (1–10-kb from TSS), and downstream (<300-bp from TES) regions (excluding distal, i.e. >10-kb from TSS, exonic and intronic regions). In most cases, putative target genes associated with the first 3000 peaks were used for GO analysis. In the event when these did not produce significant results, the top 6000 peaks were used.
CRISPR constructs and genome editing
For CRISPR constructs targeting cis-regulatory regions near TSH1 and DICE, multiplexed guides were cloned into pBUE41194 using NEB HiFi. Constructs were transformed into maize HiII embryos via Agrobacterium-mediated transformation. T0 plants were crossed to B73 (TSH1) or Mo17 (DICE) to generate edits in the B73 and Mo17 backgrounds respectively. For DICE, alleles were fixed by removal of the Cas9-guide cassette via genetic segregation and backgrounds were further purified by backcrossing for two or three generations. For the CRISPR construct targeting the BIF2 cis-regulatory region, three gRNA cassettes were assembled and cloned into the pBUE411-GGB vector using Golden Gate Assembly95. The resulting construct was transformed into inbred line B104. Edited T0 plants were crossed to B104 once and then selfed to obtain homozygous edits while simultaneously removing the T-DNA. We confirmed that BIF2 and TSH1 CRISPR edited lines contained no mutations within the respective genes.
qRT-PCR analysis of DICE edited alleles
Plants were grown in a Conviron growth chamber with 16 hours light and 8 hours dark at 26 degrees C or a greenhouse. Total RNA was extracted from plants using an RNAeasy kit (Qiagen) with on-column DNase I treatment; tissue was from the middle of the third leaf on seedlings 9–11 days after sowing. Three biological replicates, each containing three homozygous mutant samples, were harvested for each allele along with three biological replicates of segregating homozygous wildtype siblings. cDNA was made using the qScript cDNA synthesis kit (QuantaBio). qPCR was performed using PerfeCTa SYBR Green FastMix (QuantaBio) with primers shown in Supplementary Table 8, using an Eco Real-Time PCR system (Illumina). Analysis was performed using the ddCT method. We note that a genome assembly error was observed in the B73v5 genome near the DICE locus. The JBrowse2 screenshot shown in Fig.5a therefore uses DAP-seq data mapped to B73v3.
qRT-PCR analysis of TSH1 edited alleles
Expression of the TSH1 gene was analyzed via qRT-PCR using an Illumina Eco Real-Time PCR System. Two tassel primordia were combined for each genotype, and each mutant included three biological replicates (wildtype samples consisted of two biological replicates containing two tassels each). Total mRNA was extracted with the RNeasy Mini Kit (Qiagen), treated with on-column DNase I (Qiagen), and used to synthesize cDNA with the qScript cDNA Synthesis Kit (Quanta Biosciences). qPCR was conducted using gene-specific primers (Supplementary Table 8) and PerfeCTa SYBR Green FastMix (Quanta Biosciences). The cycle threshold values of the target gene were normalized to ZmUBIQUITIN (Zm00001eb340580). Analysis was performed using the ddCT method.
Scanning Electron Microscopy
Scanning Electron Microscopy (SEM) images (Extended Data Fig. 9) were taken using freshly dissected immature tassel and ears using a JMC-6000PLUS benchtop SEM.
Extended Data
Extended Data Fig. 1. DAP-seq data samples broadly across maize TF families.
a, Stacked bar graph showing 36 TF families in which at least one family member was tested in DAP-seq. The number of members tested for each is shown with a colored bar; the number of members for which high quality datasets were obtained is shown in parentheses next to the family name. b, Scatterplot showing the number of B73 and Mo17 peaks for each TF. c, Boxplots showing the distribution of the number of target genes obtained for each TF, n = 200 d, Heatmap showing Pearson correlation of TF binding profiles genome wide (10-bp bins). Side annotation colors correspond to TF structural families. TFs in the diversity panel are indicated with an asterisk. e, Average number of reads in peaks and number of peaks with GEM motif are indicated. Each datapoint corresponds to a DAP-seq dataset from either B73 or Mo17. Whiskers in boxplots in panels c and e indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles with n = 200 TFs.
Extended Data Fig. 2. Enrichment analysis of selected TFs and GWAS traits.
a, GO enrichment analysis of putative target genes assigned to peaks of selected TFs. b, Bubble plot showing enrichment for NAM panel GWAS SNPs associated with branch zone, cob length, and leaf angle that overlapped SBP30/UB3 DAP-seq peaks (right panel; red boxed areas) and bubble plot showing MADS73 DAP-seq peaks were significantly enriched for SNPs associated with several flowering time-related traits (left panel; black boxed areas). c, Bubble plot showing enrichment for Wisconsin Diversity panel GWAS SNPs associated with branching phenotypes that overlapped SBP30/UB3 DAP-seq peaks (right panel; red boxed areas) and bubble plot showing MADS73 DAP-seq peaks were significantly enriched for SNPs associated with flowering time-related traits (left panel; black boxed areas).
Extended Data Fig. 3. B73 TF diversity panel and DAP-CRM analysis.
a, Heatmap showing Pearson correlation of genome-wide binding profiles for 66 TFs in the B73 TF diversity panel. b, Phylogeny, tissue-specific RNA-expression (TPM; transcripts per million), and binding motifs of selected SBP TFs tested in maize DAP-seq. Expression patterns are often different for family members with similar motifs. c, Genome browser screenshot of SBP TFs shown in b at the TSH1 locus. d, Histogram showing the size distribution of DAP-CRMs. The dotted line indicates the average size of all DAP-CRMs. e, Histogram of the number of TFs per DAP-CRM. The black dotted line indicates the average number of TFs per DAP-CRM. The inset shows a close-up on the number of TFs from 20–63 TFs per DAP-CRM. f, Frequency of DAP-CRMs versus randomly shuffled CRMs of the same size within a 3kb window surrounding the TSS.
Extended Data Fig. 4. Genome browser views of composite DAP-seq peaks.
a, B73v5 IGV genome browser view of the Vgt1 and RAP2.7 locus showing binding by DAP-seq TFs. Lower panel shows close-up of DAP-CRMs with binding by TFs not displayed as main tracks due to space constraints. b, Genome browser view of the UNBRANCHED2 (UB2) locus showing binding by DAP-seq TFs. c, Genome browser view of the INDETERMINATE GAMETOPHYTE1 (IG1) locus showing binding site location of DAP-seq TFs. In each panel the number below the DAP-CRM track indicates the number of TF peaks present in the DAP-CRM. Please visit website genome browser for a display of all TF tracks http://hlab.bio.nyu.edu/projects/zm_crm/.
Extended Data Fig. 5. Quality assessment of B73 and Mo17 DAP-seq datasets.
a, Pearson correlation of TF binding profiles for B73 and Mo17 genomes. Most B73 peak datasets (rows annotated in yellow) clustered with Mo17 peak datasets for the same TFs lifted to B73 (rows annotated in teal), as indicated by the magenta boxes. b, Percentage of non-crossmappable peaks (coordinates could not be lifted) that overlapped with “Non-Alignable” region (NotAligned), duplicated regions (DUP), or structural variants (SV, insertions or deletions greater than 50bp). Each datapoint corresponds to the percentage of peaks from an individual TF overlapping the indicated category, n=200. Whiskers in boxplots in panels b-d indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles with n = 200 TFs. c, Upper: scatterplots showing no difference in average reads per peak or percentage of peaks with motif for shared and specific datasets. Each dot corresponds to a single TF. Lower: distribution of average reads per peak and percentage of peaks with motif for shared and specific datasets. n = 66 d, Percentage of peaks within each category that overlapped SVs and transposable elements (TEs), n=200.
Extended Data Fig. 6. Shared and genotype-specific peaks.
a, Percentage of TF diversity panel peaks also called as peaks in Mo17 (shared, grey) and those called only in B73v5 but not Mo17 (B73-specific, blue). b, Percentage of TF diversity panel peaks also called as peaks in B73 (shared, grey) and those called only in Mo17 but not B73 (Mo17-specific, green). Mo17 datasets show a similar percentage of Mo17-specific and Mo17-B73 shared peaks relative to those seen for B73-specific and B73-Mo17 shared peaks (TF order is same as in a). c, JBrowse2 genome browser view showing similar binding intensity of MADS69 in the 5’UTR of CYCLIN13 (Zm00001eb193830) in both genotypes. d, scatterplot showing good correlation of MADS69 reads in B73v5 compared to Mo17 MADS69 reads converted to B73v5 coordinates. e, JBrowse2 view of the MATE18 locus showing B73-specific peaks residing near a GWAS hit. Note that the B73-specific sequence insertion is not annotated as a TE. f, B73-specific peaks more often contain more than four SNPs per peak (top plot) or more SNPs per peak center (20bp region surrounding the peak summit) (bottom plot). Shared peaks more often contain less than four SNPs per peak (top plot) or per the 20bp region surrounding the peak summit (bottom plot). Data points represent individual TFs, n=65. For f and g, whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. g, Overlap of B73-specific peaks and B73-Mo17 shared peaks with orthogonal datasets or DAP-CRMs. Both B73-specific peaks and shared peaks show statistically significant enrichment (p value < 0.0001, two-tailed t-test) relative to randomly shuffled peaks, providing support that both B73-specific peaks and shared peaks could be functional. Data points represent individual TFs, n=200.
Extended Data Fig. 7. Peak positional variation among genotypes.
a, Comparative JBrowse2 genome browser view showing shared peaks of B73 and Mo17 at the MADS67 locus. b, JBrowse2 genome browser view showing TF binding site positional variants at the ZmATL34 locus. Lower heatmap shows ZmATL34 RNA expression levels (transcripts per million) from several tissues40. A 13-fold increase in Mo17 RNA expression was observed in radicle tissue compared to B73. c, Top: percentage of shared TF diversity panel promoter peaks (top 20%) that correspond to positional variants (PosV) greater than 500bp or less than 500bp. Bottom: percentage of top (shared and specific) promoter peaks that corresponded to shared PosVs or B73-specific peaks. Each datapoint corresponds to an individual TF, n=65. Whiskers indicate the minimum and maximum values; central lines correspond to medians and box boundaries denote the upper (25th percentile) and lower (75th percentile) quartiles. d, Plot showing percentage of B73-Mo17 shared promoter peaks (<−10kb) that were associated with differentially expressed genes in various tissues from40. Genes associated with PosV<500 bp peaks are shown in dark pink, while those associated with PosV>500 peaks are shown in light pink. Fisher’s exact tests were performed to assess the enrichment of DEGs in PosV>500 bp genes relative to PosV<500 bp genes for each TF and tissue combination, and the resulting p-values were adjusted for multiple testing. e, Heatmap of -log10 FDR-adjusted p-values from one-sided Fisher’s exact tests, evaluating whether root cell type marker genes determined by single cell RNA-seq data41 are enriched for putative target genes associated with DAP-seq peaks shared between B73 and Mo17, compared to putative targets of B73-specific peaks.
Extended Data Fig. 8. DoubleDAP-seq analysis.
a, Schematic showing the doubleDAP-seq assay in which putative heterodimeric TF-DNA complexes can be pulled down and compared to results from a single protein DAP-seq assay to assess binding site specificity differences. b, Neighbor-joining phylogeny based on amino acid similarity of group VII (blue) and group VIII (green) bHLHs from maize and Arabidopsis. Members that were tested in DAP-seq are shown in bold. Three members that were tested in DAP-seq that did not yield any peaks are shown in italics. c, AlphaFold prediction of BHLH85 homodimer. d, Normalized RNA-seq expression (TPM; transcripts per million) of group VIII and VII bHLHs showing heterodimer pairs are co-expressed in many tissues. Normalized expression data from Walley et al.96. e, Venn diagram showing low degree of overlap between Q-A-R (subclade VIII) homodimers and Q-A-R:ZmSPT (subclade VII) heterodimers as exemplified by HALO-BHLH113 (Q-A-R bHLH) in single DAP and HALO-BHLH113:SBPTag-BHLH125/ZmSPT1 in doubleDAP. Only 2% of HALO-BHLH113 peaks were captured in the doubleDAP dataset indicating that the heterodimer configuration is preferred relative to the homodimer.
Extended Data Fig. 9. CRISPR editing of maize cis-regulatory regions influences phenotype.
a, Genome browser screenshot of TSH1 locus showing binding in upstream promoter region by many TFs. Grey shaded areas upstream of and within CRM169681 indicate regions that were deleted in alleles shown in b. b, Schematic showing two independent CRISPR alleles with deletions and inversions that eliminate at least five TF binding sites (colored bars; colors match TFs shown in genome browser screenshot in a). Lightly shaded grey areas indicate regions deleted in both alleles. c, Images of mature tassels for WT, two tsh1 promoter CRISPR alleles, and tsh1-ref mutant (coding region mutation51). Both CRISPR promoter alleles show outgrowth of the tassel sheath leaf (white arrows) that is not present in the WT (black arrow). Tassel branching is reduced in the tsh1-ref, and the CRISPR promoter alleles (white brackets) relative to WT (black bracket). d, SEMs of immature tassels of the CRISPR promoter alleles showing bract outgrowth (white arrowheads). e, qRT-PCR analysis of CRISPR promoter alleles showing reduced expression in immature tassels relative to WT immature tassels. Data are presented as mean values. Error bars represent standard deviation of three biological replicates for each edited allele (two biological replicates for wildtype). Data points correspond to individual biological replicates. Statistical significance determined by two-sided t-test. f, CRISPR editing of BIF2 3’UTR ARF binding sites. A cis-regulatory module (CRM18765) is situated downstream of the BIF2 gene. Within this CRM region, there is a strong ARF peak containing five ARF binding motifs (three TGTCs and two GACAs)38. Three single guide RNAs (gRNAs) were designed for CRISPR-Cas9 editing that specifically targeted the ARF motifs. Three deletion alleles were obtained. Homozygous plants of these alleles exhibited a weak bif2 phenotype with various degrees of severity (BIF2-crm1cr1 and BIF2-crm1cr3 are more severe than BIF2-crm1cr2) during the early ear development stage, characterized by partial barren patches (white arrows) on the ear primordia as seen by SEM.
Extended Data Fig. 10. Sequence alignment of Mo17 tandem-duplicated CRMs.
Nucleotide alignment of B73v3_DICE (single copy), Mo17_CRM119798 (tandem copy 1), and Mo17_CRM119799 (tandem copy 2) showing conservation of individual TF binding motifs.
Supplementary Material
Supplementary Figure 1.
Jaccard similarity matrix for putative target genes of DAP-seq TFs.
Supplementary Figure 2.
Preferential sub-clade binding analysis of GLK and TCP family members
Supplementary Figure 3.
Preferential sub-clade binding analysis of LBD family members
Supplementary Figure 4.
GWAS enrichment of Wisconsin Diversity Panel / 282 Maize Association Panel traits and DAP-seq peaks
Supplementary Figure 5.
Bubble plot of B73-specific peaks (associated with SVs, SNPs, indels, or duplicated regions (DUPs)) and NAM GWAS traits
Supplementary Figure 6.
Bubble plot of B73-specific peaks (associated with SVs, SNPs, indels, or duplicated regions (DUPs)) and Wisconsin Diversity Panel GWAS traits
Supplementary Figure 7.
Heatmap showing significance of overlaps between putative target genes associated with shared or B73-specific peaks and differentially expressed genes between B73 vs. Mo17 in various tissues
Supplementary Table 1.
Information about DAP-seq samples and peaks.
Supplementary Table 2.
Information about DAP-seq B73 TF diversity panel.
Supplementary Table 4.
Information about doubleDAP-seq samples and peaks.
Supplementary Table 5.
List of regions in B73v5 blacklist that were excluded from analysis.
Supplementary Table 6.
List of regions in Mo17 CAU1.0 blacklist that were excluded from analysis.
Supplementary Table 7.
SRA accessions for RNA-seq differential expression analysis.
Supplementary Table 8.
List of primers used for qRT-PCR analysis.
Supplementary Table 3. (separate file)
Table listing combinations of three TFs that bound DAP-CRM regions.
Acknowledgements:
We thank Rhiannon Macrae for helpful comments and suggestions on the manuscript and maizeGDB for data curation of informatics resources. We acknowledge the Office of Advanced Research Computing (OARC) at Rutgers, The State University of New Jersey for providing access to the Amarel cluster. This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise.
Funding:
This work was supported by NSF Plant Genome Research Project grant IOS-1916804 to A.G. and S.C.H. and NIH award R35GM138143 to S.C.H. A.C. is a Fulbright Scholar and recipient of a Meta Prevorsek Turner Fellowship. This material was also in part based on work supported by the Center for Bioenergy Innovation (CBI), U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under Award Number ERKP886 and the Oak Ridge National Laboratory Director’s R&D (DRD) Program (M.L). Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the Office of Science of the U.S. Department of Energy under Contract Number DE-AC05-00OR22725.
Footnotes
Code availability:
Analysis pipelines are available at https://github.com/hlab1/Maize_B73_Mo17_DAP.
Competing interests: The authors declare they have no competing interests.
Data availability:
Raw and processed data for DAP-seq datasets generated in this study are available from NCBI GEO under accession number GSE275897. Processed data is available from Zenodo DOI 10.5281/zenodo.14991915. Comparative peak data tracks from the different inbred lines can be viewed on JBrowse2 at http://hlab.bio.nyu.edu/projects/zm_crm/.
REFERENCES
- 1.Kim S. & Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol Cell 83, 373–392 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rodriguez-Leal D, Lemmon ZH, Man J, Bartlett ME & Lippman ZB Engineering Quantitative Trait Variation for Crop Improvement by Genome Editing. Cell 171, 470–480 e8 (2017). [DOI] [PubMed] [Google Scholar]
- 3.Liang Y, Liu HJ, Yan J. & Tian F. Natural Variation in Crops: Realized Understanding, Continuing Promise. Annu Rev Plant Biol 72, 357–385 (2021). [DOI] [PubMed] [Google Scholar]
- 4.Wang X. et al. Dissecting cis-regulatory control of quantitative trait variation in a plant stem cell circuit. Nat Plants 7, 419–427 (2021). [DOI] [PubMed] [Google Scholar]
- 5.Meyer RS & Purugganan MD Evolution of crop species: genetics of domestication and diversification. Nat Rev Genet 14, 840–52 (2013). [DOI] [PubMed] [Google Scholar]
- 6.Liu L. et al. Enhancing grain-yield-related traits by CRISPR-Cas9 promoter editing of maize CLE genes. Nat Plants 7, 287–294 (2021). [DOI] [PubMed] [Google Scholar]
- 7.Sun Y. et al. Divergence in the ABA gene regulatory network underlies differential growth control. Nat Plants 8, 549–560 (2022). [DOI] [PubMed] [Google Scholar]
- 8.Wallace JG, Rodgers-Melnick E. & Buckler ES On the Road to Breeding 4.0: Unraveling the Good, the Bad, and the Boring of Crop Quantitative Genomics. Annu Rev Genet 52, 421–444 (2018). [DOI] [PubMed] [Google Scholar]
- 9.Kremling KAG et al. Dysregulation of expression correlates with rare-allele burden and fitness loss in maize. Nature 555, 520–523 (2018). [DOI] [PubMed] [Google Scholar]
- 10.Hake S. & Ross-Ibarra J. Genetic, evolutionary and plant breeding insights from the domestication of maize. Elife 4(2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rodgers-Melnick E, Vera DL, Bass HW & Buckler ES Open chromatin reveals the functional maize genome. Proc Natl Acad Sci U S A 113, E3177–84 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.O’Malley RC et al. Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape. Cell 166, 1598 (2016). [DOI] [PubMed] [Google Scholar]
- 13.Zhang Y. et al. Evolutionary rewiring of the wheat transcriptional regulatory network by lineage-specific transposable elements. Genome Res 31, 2276–2289 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Baumgart LA et al. An atlas of conserved transcription factor binding sites reveals the cell type-resolved gene regulatory landscape of flowering plants. bioRxiv, 2024.10.08.617089 (2024). [Google Scholar]
- 15.Zhan J. et al. Opaque-2 Regulates a Complex Gene Network Associated with Cell Differentiation and Storage Functions of Maize Endosperm. Plant Cell 30, 2425–2446 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wallace JG et al. Association mapping across numerous traits reveals patterns of functional variation in maize. PLoS Genet 10, e1004845 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Mural RV et al. Association mapping across a multitude of traits collected in diverse environments in maize. Gigascience 11(2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Marand AP, Eveland AL, Kaufmann K. & Springer NM cis-Regulatory Elements in Plant Development, Adaptation, and Evolution. Annu Rev Plant Biol 74, 111–137 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Tu X. et al. Reconstructing the maize leaf regulatory network using ChIP-seq data of 104 transcription factors. Nat Commun 11, 5089 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jores T. et al. Plant enhancers exhibit both cooperative and additive interactions among their functional elements. Plant Cell (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hufford MB et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ricci WA et al. Widespread long-range cis-regulatory elements in the maize genome. Nat Plants 5, 1237–1249 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Crisp PA et al. Stable unmethylated DNA demarcates expressed genes and their cis-regulatory space in plant genomes. Proc Natl Acad Sci U S A 117, 23991–24000 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Song B. et al. Conserved noncoding sequences provide insights into regulatory sequence and loss of gene expression in maize. Genome Res 31, 1245–1257 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wills DM et al. From many, one: genetic control of prolificacy during maize domestication. PLoS Genet 9, e1003604 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Salvi S. et al. Conserved noncoding genomic sequences associated with a flowering-time quantitative trait locus in maize. Proc Natl Acad Sci U S A 104, 11376–81 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Iotchkova V. et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat Genet 51, 343–353 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Eichten SR et al. B73-Mo17 near-isogenic lines demonstrate dispersed structural variation in maize. Plant Physiol 156, 1679–90 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Buckler ES et al. The genetic architecture of maize flowering time. Science 325, 714–8 (2009). [DOI] [PubMed] [Google Scholar]
- 30.Mickelson SM, Stuber CS, Senior L. & Kaeppler SM Quantitative Trait Loci Controlling Leaf and Tassel Traits in a B73 × Mo17 Population of Maize. Crop Science 42, 1902–1909 (2002). [Google Scholar]
- 31.Ordas B. et al. Mapping of QTL for resistance to the Mediterranean corn borer attack using the intermated B73 × Mo17 (IBM) population of maize. Theor Appl Genet 119, 1451–9 (2009). [DOI] [PubMed] [Google Scholar]
- 32.Goering R, Larsen S, Tan J, Whelan J. & Makarevitch I. QTL mapping of seedling tolerance to exposure to low temperature in the maize IBM RIL population. PLoS One 16, e0254437 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sun S. et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat Genet 50, 1289–1295 (2018). [DOI] [PubMed] [Google Scholar]
- 34.Song B. et al. AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proc Natl Acad Sci U S A 119(2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhao H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–7 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Liang Y. et al. ZmMADS69 functions as a flowering activator through the ZmRap2.7-ZCN8 regulatory module and contributes to maize flowering time adaptation. New Phytol 221, 2335–2347 (2019). [DOI] [PubMed] [Google Scholar]
- 37.Tian D. et al. GWAS Atlas: a curated resource of genome-wide variant-trait associations in plants and animals. Nucleic Acids Res 48, D927–D932 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Galli M. et al. The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family. Nat Commun 9, 4526 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Maekawa S. et al. The Arabidopsis ubiquitin ligases ATL31 and ATL6 control the defense response as well as the carbon/nitrogen response. Plant Mol Biol 79, 217–27 (2012). [DOI] [PubMed] [Google Scholar]
- 40.Zhou P, Hirsch CN, Briggs SP & Springer NM Dynamic Patterns of Gene Expression Additivity and Regulatory Variation throughout Maize Development. Mol Plant 12, 410–425 (2019). [DOI] [PubMed] [Google Scholar]
- 41.Guillotin B. et al. A pan-grass transcriptome reveals patterns of cellular divergence in crops. Nature 617, 785–791 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Li M. et al. Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors. Nat Commun 14, 2600 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Heim MA et al. The basic helix-loop-helix transcription factor family in plants: a genome-wide study of protein structure and functional diversity. Mol Biol Evol 20, 735–47 (2003). [DOI] [PubMed] [Google Scholar]
- 44.de Martin X, Sodaei R. & Santpere G. Mechanisms of Binding Specificity among bHLH Transcription Factors. Int J Mol Sci 22(2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gaillochet C. et al. Control of plant cell fate transitions by transcriptional and hormonal signals. Elife 6(2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Crawford BC & Yanofsky MF HALF FILLED promotes reproductive tract development and fertilization efficiency in Arabidopsis thaliana. Development 138, 2999–3009 (2011). [DOI] [PubMed] [Google Scholar]
- 47.Gallavotti A. et al. The role of barren stalk1 in the architecture of maize. Nature 432, 630–5 (2004). [DOI] [PubMed] [Google Scholar]
- 48.Jumper J. et al. Applying and improving AlphaFold at CASP14. Proteins 89, 1711–1721 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gremski K, Ditta G. & Yanofsky MF The HECATE genes regulate female reproductive tract development in Arabidopsis thaliana. Development 134, 3593–601 (2007). [DOI] [PubMed] [Google Scholar]
- 50.Aguirre L, Hendelman A, Hutton SF, McCandlish DM & Lippman ZB Idiosyncratic and dose-dependent epistasis drives variation in tomato fruit size. Science 382, 315–320 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Whipple CJ et al. A conserved mechanism of bract suppression in the grass family. Plant Cell 22, 565–78 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.McSteen P. et al. barren inflorescence2 Encodes a co-ortholog of the PINOID serine/threonine kinase and is required for organogenesis during inflorescence and vegetative development in maize. Plant Physiol 144, 1000–11 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Zheng L. et al. Prolonged expression of the BX1 signature enzyme is associated with a recombination hotspot in the benzoxazinoid gene cluster in Zea mays. J Exp Bot 66, 3917–30 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Frey M. et al. Analysis of a chemical plant defense mechanism in grasses. Science 277, 696–9 (1997). [DOI] [PubMed] [Google Scholar]
- 55.Engelhorn J. et al. Phenotypic variation in maize can be largely explained by genetic variation at transcription factor binding sites. bioRxiv, 2023.08.08.551183 (2023). [Google Scholar]
- 56.Parvathaneni RK et al. The regulatory landscape of early maize inflorescence development. Genome Biol 21, 165 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Cahn J. et al. MaizeCODE reveals bi-directionally expressed enhancers that harbor molecular signatures of maize domestication. bioRxiv, 2024.02.22.581585 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Sun Y. et al. 3D genome architecture coordinates trans and cis regulation of differentially expressed ear and tassel genes in maize. Genome Biol 21, 143 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Savadel SD et al. The native cistrome and sequence motif families of the maize ear. PLoS Genet 17, e1009689 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Oka R. et al. Genome-wide mapping of transcriptional enhancer candidates using DNA and chromatin features in maize. Genome Biol 18, 137 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Marand AP et al. The genetic architecture of cell-type-specific cis-regulation. bioRxiv, 2024.08.17.608383 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hendelman A. et al. Conserved pleiotropy of an ancient plant homeobox gene uncovered by cis-regulatory dissection. Cell 184, 1724–1739 e16 (2021). [DOI] [PubMed] [Google Scholar]
Methods only references:
- 63.Bartlett A. et al. Mapping genome-wide transcription-factor binding sites using DAP-seq. Nat Protoc 12, 1659–1672 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Burdo B. et al. The Maize TFome--development of a transcription factor open reading frame collection for functional genomics. Plant J 80, 356–66 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Yilmaz A. et al. GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol 149, 171–80 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Mirdita M. et al. ColabFold: making protein folding accessible to all. Nat Methods 19, 679–682 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Bolger AM, Lohse M. & Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–20 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Langmead B. & Salzberg SL Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357–9 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Guo Y, Mahony S. & Gifford DK High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints. PLoS Comput Biol 8, e1002638 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Yu G, Wang LG & He QY ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–3 (2015). [DOI] [PubMed] [Google Scholar]
- 71.Machanick P. & Bailey TL MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–7 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Shen Z, Hoeksema MA, Ouyang Z, Benner C. & Glass CK MAGGIE: leveraging genetic variation to identify DNA sequence motifs mediating transcription factor binding and function. Bioinformatics 36, i84–i92 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Grzybowski MW et al. A common resequencing-based genetic marker data set for global maize diversity. Plant J 113, 1109–1121 (2023). [DOI] [PubMed] [Google Scholar]
- 74.Liu X, Huang M, Fan B, Buckler ES & Zhang Z. Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet 12, e1005767 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Yin L. et al. rMVP: A Memory-efficient, Visualization-enhanced, and Parallel-accelerated Tool for Genome-wide Association Study. Genomics Proteomics Bioinformatics 19, 619–628 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Dong Z. et al. Necrotic upper tips1 mimics heat and drought stress and encodes a protoxylem-specific transcription factor in maize. Proc Natl Acad Sci U S A 117, 20908–20919 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Dai D. et al. Paternal imprinting of dosage-effect defective1 contributes to seed weight xenia in maize. Nat Commun 13, 5366 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Wu H. et al. NAKED ENDOSPERM1, NAKED ENDOSPERM2, and OPAQUE2 interact to regulate gene networks in maize endosperm development. Plant Cell 36, 19–39 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Bang S. et al. WUSCHEL-dependent chromatin regulation in maize inflorescence development at single-cell resolution. bioRxiv, 2024.05.13.593957 (2024). [Google Scholar]
- 80.Kielbasa SM, Wan R, Sato K, Horton P. & Frith MC Adaptive seeds tame genomic sequence comparison. Genome Res 21, 487–93 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Robinson JT et al. Integrative genomics viewer. Nat Biotechnol 29, 24–6 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Diesh C. et al. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol 24, 74 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Ramirez F, Dundar F, Diehl S, Gruning BA & Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res 42, W187–91 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Quinlan AR & Hall IM BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–2 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Shumate A. & Salzberg SL Liftoff: accurate mapping of gene annotations. Bioinformatics 37, 1639–1643 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Goel M, Sun H, Jiao WB & Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol 20, 277 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Li B. & Dewey CN RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Dobin A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Soneson C, Love MI & Robinson MD Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Res 4, 1521 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Gu Z, Eils R. & Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–9 (2016). [DOI] [PubMed] [Google Scholar]
- 91.Gu Z. Complex heatmap visualization. iMeta 1, e43 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Noshay JM et al. Stability of DNA methylation and chromatin accessibility in structurally diverse maize genomes. G3 (Bethesda) 11(2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Ge SX, Jung D. & Yao R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Xing HL et al. A CRISPR/Cas9 toolkit for multiplex genome editing in plants. BMC Plant Biol 14, 327 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Chen Z, Debernardi JM, Dubcovsky J. & Gallavotti A. The combination of morphogenic regulators BABY BOOM and GRF-GIF improves maize transformation efficiency. bioRxiv, 2022.09.02.506370 (2022). [Google Scholar]
- 96.Justin W. Walley et al. Integration of omic networks in a developmental atlas of maize. Science 353, 814–818 (2016). DOI: 10.1126/science.aag1125. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Figure 1.
Jaccard similarity matrix for putative target genes of DAP-seq TFs.
Supplementary Figure 2.
Preferential sub-clade binding analysis of GLK and TCP family members
Supplementary Figure 3.
Preferential sub-clade binding analysis of LBD family members
Supplementary Figure 4.
GWAS enrichment of Wisconsin Diversity Panel / 282 Maize Association Panel traits and DAP-seq peaks
Supplementary Figure 5.
Bubble plot of B73-specific peaks (associated with SVs, SNPs, indels, or duplicated regions (DUPs)) and NAM GWAS traits
Supplementary Figure 6.
Bubble plot of B73-specific peaks (associated with SVs, SNPs, indels, or duplicated regions (DUPs)) and Wisconsin Diversity Panel GWAS traits
Supplementary Figure 7.
Heatmap showing significance of overlaps between putative target genes associated with shared or B73-specific peaks and differentially expressed genes between B73 vs. Mo17 in various tissues
Supplementary Table 1.
Information about DAP-seq samples and peaks.
Supplementary Table 2.
Information about DAP-seq B73 TF diversity panel.
Supplementary Table 4.
Information about doubleDAP-seq samples and peaks.
Supplementary Table 5.
List of regions in B73v5 blacklist that were excluded from analysis.
Supplementary Table 6.
List of regions in Mo17 CAU1.0 blacklist that were excluded from analysis.
Supplementary Table 7.
SRA accessions for RNA-seq differential expression analysis.
Supplementary Table 8.
List of primers used for qRT-PCR analysis.
Supplementary Table 3. (separate file)
Table listing combinations of three TFs that bound DAP-CRM regions.
Data Availability Statement
Raw and processed data for DAP-seq datasets generated in this study are available from NCBI GEO under accession number GSE275897. Processed data is available from Zenodo DOI 10.5281/zenodo.14991915. Comparative peak data tracks from the different inbred lines can be viewed on JBrowse2 at http://hlab.bio.nyu.edu/projects/zm_crm/.















