Skip to main content
Cell Reports Methods logoLink to Cell Reports Methods
. 2025 Jan 27;5(1):100963. doi: 10.1016/j.crmeth.2024.100963

A network-enabled pipeline for gene discovery and validation in non-model plant species

Dae Kwan Ko 1,2,3, Federica Brandizzi 1,2,3,4,
PMCID: PMC11840947  PMID: 39874949

Summary

Identifying key regulators of important genes in non-model crop species is challenging due to limited multi-omics resources. To address this, we introduce the network-enabled gene discovery pipeline NEEDLE, a user-friendly tool that systematically generates coexpression gene network modules, measures gene connectivity, and establishes network hierarchy to pinpoint key transcriptional regulators from dynamic transcriptome datasets. After validating its accuracy with two independent datasets, we applied NEEDLE to identify transcription factors (TFs) regulating the expression of cellulose synthase-like F6 (CSLF6), a crucial cell wall biosynthetic gene, in Brachypodium and sorghum. Our analyses uncover regulators of CSLF6 and also shed light on the evolutionary conservation or divergence of gene regulatory elements among grass species. These results highlight NEEDLE’s capability to provide biologically relevant TF predictions and demonstrate its value for non-model plant species with dynamic transcriptome datasets.

Keywords: gene discovery, crop genomics, transcriptomics, network modeling, transcriptional regulation, gene regulatory networks

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • We develop NEEDLE, a gene discovery pipeline for non-model plant species

  • We apply the pipeline to maize UPR and soybean seed development

  • We identify transcription factors regulating CSLF6 genes in Brachypodium and sorghum

Motivation

Meeting the energy and food demands of a growing global population demands the swift development of climate-resilient crops, achievable through targeted gene reprogramming for growth, development, and stress responses. Accurate prediction of transcription regulators is crucial for accelerating crop improvement via targeted engineering and breeding approaches. However, this is constrained by the scarcity of multi-omics datasets for most non-model species. To address this challenge, we present a robust network-based gene discovery pipeline that identifies transcription factors upstream of genes of interest by leveraging transcriptomic dynamics in non-model plants followed by rapid in planta experimental validation.


Ko and Brandizzi introduce a gene discovery pipeline tailored for non-model plant species. The approach leverages network features from transcriptome datasets to accurately predict upstream transcription factors for target genes.

Introduction

In all living organisms, growth,1 development,2 and responses to environmental stress3 involve dynamic rearrangements of gene expression patterns in a spatiotemporal context. Gene regulation is a complex process involving interactions between cis-regulatory elements (CREs) near the transcriptional start site of genes and trans-regulatory elements, such as transcription factors (TFs), that bind to these CREs.4,5 Predicting TFs responsible for key agronomic traits could significantly improve crops through targeted molecular engineering6 and breeding,7 increasing crop productivity, providing higher yields, enhancing stress resilience, and improving agricultural practices.8 However, predicting gene regulation is challenging due to the complex interactions among DNA, RNA, and proteins, which form intricate networks within cells.5,9 Investigating gene reprogramming is often more effective than focusing on individual genes.10 A multi-omics approach, which measures multiple molecular characteristics globally, could address these challenges by constructing such networks.10,11 Despite the potential, this approach is feasible only for a few model species, such as Arabidopsis thaliana and rice (Oryza sativa). For example, investigating TF-DNA interactions in vivo often necessitates the generation of transgenic lines containing a protein tag fused with a TF of interest,12 a process that proves exceedingly challenging or infeasible in most crops. While generating transgenic lines for in vivo TF-DNA interaction studies is difficult in most crops, advancements in RNA sequencing (RNA-seq), including reduced costs,13 standardized working pipelines,14 and increased access to high-quality crop genomes,15 have made transcriptome profiling more accessible across various crop species.13 These developments underscore the potential to use transcriptomic resources in crops in conjunction with computational tools to construct biological networks and facilitate gene discovery even when extensive multi-omics data are not available.

Transcriptional networks visually depict the complex regulatory interactions between TFs and their target genes with TFs and genes represented as nodes and their regulatory relationships as connecting edges.9,16 Gaining meaningful biological insights from these networks requires sophisticated processing, which can be daunting for researchers without expertise in network biology or bioinformatics.10,11 There are two primary computational strategies for predicting transcriptional networks: coexpression network analysis and prediction of gene regulatory networks (GRNs). Although both methods employ transcriptome datasets, they operate using distinct computational algorithms, pursue distinct objectives, and yield different outcomes.10 Coexpression network analysis groups genes with similar expression patterns into modules based on the principle of guilt by association.17 The process generates coexpression modules by grouping genes with statistically similar expression patterns. Within each coexpression module, genes are expected to share common or identical biological pathways, enhancing the feasibility of downstream analyses. Thus, due to its straightforward concept, coexpression network analysis has recently gained popularity. However, this approach does not directly unveil the structure of the network, limiting its suitability for gene discovery. In contrast, tree-based ensemble techniques, such as random forest,18 effectively infer the hierarchical organization of GRNs from dynamic transcriptome data19 by predicting the strength of regulatory connections between TFs and their target genes. This method produces a ranked list of predicted relationships between regulators and targets, which can be illustrated as a network diagram. However, the precision of such GRN predictions can be compromised when handling numerous genes with diverse expression patterns, given the inherent characteristics of tree-based algorithms.18 Therefore, experimental validation is crucial to confirm these predictions, especially in crop research, where generating transgenic lines or mutants is labor intensive.

Studies such as Shanks et al.20 identified GRNs underlying nitrogen use efficiency in rice, validating six of 18 predicted TFs based on previous studies. However, as the prediction is designed to be validated only with published datasets, this study does not provide a complete workflow from prediction to validation, making it difficult to validate genes without existing validation data. Similarly, Cassan et al.21 introduced the Dashboard for the Inference and Analysis of Networks from Expression Data web platform, which combines expression clustering with GRN modeling but lacks a validation component. Meanwhile, Ueda et al.22 focused on the functional characterization of a rice nitrogen deficiency-responsive GRN built on coexpression modules of genes, providing in vivo validation of predicted genes but offering limited background and details of the pipeline, making it less accessible to non-experts. Thus, despite advancements in transcriptional network modeling for crop science, there is a need for a user-friendly workflow that integrates network predictions with rapid experimental validation in planta to enhance confidence and accuracy in gene discovery (Table S1).

In this study, we introduce a network-enabled gene discovery pipeline (NEEDLE), a streamlined platform that leverages dynamic transcriptomics data to construct GRNs with high precision. By integrating the complementary strengths of both the coexpression network and GRN prediction algorithms, NEEDLE provides exceptional accuracy in mapping regulatory interactions. Additionally, this tool facilitates straightforward experimental validation of regulatory connections between TFs and their target genes in planta using a transient reporter system.

As case studies, we applied NEEDLE to identify the key TF involved in the maize unfolded protein response (UPR), a critical defense mechanism against proteotoxic stress of the endoplasmic reticulum (ER), and to uncover a critical TF for soybean seed development, showcasing its effectiveness in both monocot and dicot non-model species. NEEDLE was also used to identify TFs regulating CSLF6 orthologs in Brachypodium distachyon and Sorghum bicolor. CSLF6, which synthesizes mixed-linkage glucan (MLG), is essential for cell wall formation in grasses and has significant implications for human nutrition and biofuels. Despite its importance, the regulators of CSLF6 expression are still unknown. NEEDLE not only identified these regulators but also revealed both functional divergence and conservation of TFs across species through cross-species prediction and validation. Thus, NEEDLE represents a major advancement in predicting TFs upstream of genes of interest in crops using transcriptomics data alone without the need for extensive omics datasets. As transcriptomic datasets become more widely available for non-model plant species, NEEDLE’s ability to predict and validate agronomically important regulators will greatly accelerate gene discovery in crops.

Results

Overview of how NEEDLE works

NEEDLE consists largely of the prediction phase (steps 1–4) and the validation phase (steps 5 and 6). To initiate NEEDLE (Figure 1A; Table 1), a standard RNA-seq analysis pipeline14 is applied to the transcriptome dataset (step 1). The high level of dynamics in the dataset offers a better resolution for downstream network analyses, allowing for a clearer separation among coexpression modules and a more precise delineation of TF-gene relationships. In this context, “dynamics” is defined as encompassing a minimum of six samples (i.e., data points), based both our empirical observations and a well-known limitation of coexpression network approaches. Specifically, studies with low sample numbers, common in small- to medium-scale projects, are prone to false positives (i.e., statistically insignificant but seemingly high correlations).23 Importantly, while some coexpression network algorithms advise against limiting input genes, we found that having only “not lowly expressed genes” (e.g., typically differentially expressed genes [DEGs] between controls and samples exposed to stress among multiple developmental stages or genes with fragments per kilobase of transcript per million mapped reads [FPKM] of 10 in at least one sample) is necessary to minimize the background noise signals for downstream analyses in NEEDLE. Including genes with low expression can overwhelm the coexpression algorithm, potentially resulting in the formation of less meaningful or even meaningless coexpression modules. This is because the large size of genes (e.g., usually thousands) with numerous expression patterns across ≥6 samples is still capable of providing sufficient statistical power, as demonstrated in previous studies.24,25,26 Furthermore, the main purpose of NEEDLE, which is to pinpoint specific TFs that are highly likely to regulate target genes of interest within the high-confidence network, is well aligned with the idea. A normalized gene expression matrix is then computed (e.g., log2-transformed fold change [log2FC] of FPKM and the output log2FC of expression values from DESeq2). Because this set of genes (usually numbering in the thousands) establishes the foundational information for subsequent steps, ensuring a high-quality dataset at this stage is critical for success.

Figure 1.

Figure 1

NEEDLE is a streamlined workflow for gene discovery in non-model plant species

(A) A schematic overview of NEEDLE. This workflow leverages a diverse range of transcriptomics data to create a dynamic dataset and employs coexpression network analysis to pinpoint modules exhibiting shared expression patterns, subsequently annotated for a variety of biological processes. In selected modules, it constructs GRNs and experimentally validates the regulatory relationships. NEEDLE consists of two parts: prediction (steps 1–4, peach background) and validation (steps 5 and 6, sky blue background). In step 1, transcriptome profiles from different genotypes, tissue types, and/or time points, measured by microarray or RNA-seq, serve as the starting material. Here, biologically meaningful genes (i.e., DEGs) are identified using DESeq2. Step 2 involves establishing coexpression networks using WGCNA, where the colors of each circle, representing individual genes, indicate different expression patterns. In step 3, each module is annotated to GO terms using agriGO, linking each module to distinct biological pathways, as illustrated by the icons (root, shoot, and chemical structure). Step 4 focuses on GRN prediction for selected modules(s) (e.g., the one associated with shoot development), utilizing GENIE3 with the annotation list of TF genes from the corresponding genome. The circle with a yellow border highlights the gene of interest, while the circle with a red border indicates its predicted upstream TF. In step 5, predicted TFs are further refined through cistrome analyses (e.g., STREME and mVISTA) to generate a list of candidate TFs. Experimental validation of TF-target gene regulation is then conducted using the dual-LUC assay, facilitating the identification of strong candidates for mutant generation or transgenic line development. The peach background indicates the prediction phase, while the sky blue background indicates the validation phase.

(B and C) Validation of NEEDLE’s accuracy using two datasets: maize UPR (B) and soybean seed development (C). The bar graphs represent the number of predicted regulatory interactions for each TF within the selected coexpression module, indicating that ZmbZIP60 (red) (B) and GmbZIP67 (red) (C) are the most influential TFs in the maize and soybean networks, respectively. Software algorithms used in NEEDLE are underlined.

Table 1.

Summary of input and output data for each step in the NEEDLE workflow

Step Input Output
1 RNA-seq or microarray raw data gene expression matrix
2 gene expression matrix coexpression modules
3 coexpression modules functional annotations
4 coexpresison module(s) of interest network hierarchy
5 promoter sequences of module genes conserved or significantly enriched CREs in modules genes
6 identity of TF(s) and target gene(s) of interest in vivo validation of transcriptional activity of TF(s) on target gene(s)

Each step corresponds to the process illustrated in Figure 1.

Subsequently, the gene expression matrix from step 1 is fed into weighted correlation network analysis (WGCNA), an unsupervised coexpression network analysis algorithm.27 WGCNA constructs weighted coexpression networks based on the similarity of expression profiles across samples (step 2). This stage typically generates many coexpression modules (e.g., 15–30), each representing a group of coexpressed genes, depending on the size of input genes and the sample dynamics. Each coexpression module exhibits a distinct gene expression pattern (represented by the expression levels of central genes, known as eigengenes)27 across the samples analyzed. In step 3, each module is annotated for potential biological functions through Gene Ontology (GO) analyses using broadly accessible databases, including agriGO,28 PLAZA,29 and BINGO.30 After completing this step, researchers may opt for specific module(s) based on gene membership of interest, eigengene expression patterns, and functional annotations.

In step 4, NEEDLE employs Gene Network Inference with Ensemble of Trees 3 (GENIE3), a random forest-based approach to construct comprehensive GRNs within the coexpression modules selected.31 GENIE3 requires a list of TF genes in each coexpression module. Depending on the plant species, TF lists are available in genome databases, protein-DNA interactome databases, or network databases. Among those, we recommend obtaining them from the Plant Transcription Factor Database,32 which provides TF lists of 156 plant species in a user-friendly interface. Through the modeling, GENIE3 predicts the strength of potential regulatory links between TFs and their putative targets, thereby establishing the network hierarchy within the chosen coexpression module.

In the first step of the validation phase, to support the gene regulatory information obtained in silico from GENIE3, NEEDLE proceeds with cistrome analysis by applying either promoter scanning or de novo motif analyses using user-friendly online tools like mVISTA33 or Sensitive, Thorough, Rapid, Enriched Motif Elicitation (STREME),34 respectively (step 5). Finally, NEEDLE adopts Agrobacterium tumefaciens-mediated transient transformation of Nicotiana benthamiana leaves using the dual-luciferase (LUC) assay system35 to timely validate the prediction function of TFs on the expression of their target genes (step 6). The dual-LUC assay system requires two types of DNA constructs: a reporter in which the gene promoter drives the expression of the LUC gene, which is normalized by the Renilla gene present in the same construct, and an effector in which a TF gene is constitutively expressed by a cauliflower mosaic virus 35S promoter. Each construct is introduced into Agrobacterium cells using heat shock transformation. The Agrobacterium cells containing each construct are mixed before tobacco infiltration. Then, the normalized LUC activities are monitored using a luminometer. A comprehensive explanation of the NEEDLE methodology is available in the STAR Methods. Detailed information on input and output data at each step is presented in Table 1, and a comparison of NEEDLE with other network prediction and validation approaches in transcriptome-based gene discovery studies can be found in Table S1.

Demonstration of the accuracy of NEEDLE’s prediction capability

To assess NEEDLE’s effectiveness, we conducted two independent test studies on non-model plant species representing both monocots and dicots. In this study, we specifically define non-model dicots as species other than A. thaliana36 and non-model monocots as species other than O. sativa37 due to the limited availability of multi-omics data (beyond transcriptomics) for these species.38 The first test study utilized a time-series transcriptome dataset39 in maize (Zea mays) under prolonged ER stress—a condition often induced by various environmental pressures, leading to substantial crop productivity losses.40 ER stress activates the UPR, a set of intracellular signaling pathways that help maintain ER homeostasis by regulating protein folding, cellular trafficking, and programmed cell death.41 UPR initiation is primarily driven by dynamic gene expression changes orchestrated by the master regulators basic leucine zipper28 (bZIP28) and bZIP60.42 We opted for this dataset for the following reasons. Extensive investigation has already identified these key transcriptional regulators within the maize UPR,39,43,44,45 rendering them robust targets for our analysis. The dataset includes six time points (3, 6, 12, 24, 36, and 48 h) in addition to control samples, offering rich temporal dynamics. The quality of the data has been validated across multiple independent studies.39,45

Our analysis revealed a bimodal distribution of DEGs, peaking at 6 and 48 h, with a total of 3,967 DEGs identified compared to 0 h (Figure S1A; Table S2), reflecting the multi-phasic response to ER stress observed in previous research.39 Subsequent WGCNA identified 17 coexpression modules, each displaying distinct expression patterns (Figure S1B; Table S2). GO analysis revealed significant enrichment of ER stress-related pathways specifically in the brown module (Figure S1C; Table S2), which exhibited typical UPR-associated gene expression dynamics: a gradual increase peaking mid-stress followed by a decline (Figure S1B). We focused on the brown module to construct a GRN, aiming to predict the key TFs driving the temporal gene reprogramming within this module. Using the GENIE3 algorithm, we identified 893 regulatory connections between TFs and target genes. In this GRN, ZmbZIP60, a maize homolog of bZIP60,39,43,44 emerged as the most influential TF with the highest number of regulatory connections (n = 180) (Figure 1B; Table S2), corroborating its established role as a key UPR regulator.39,42,43 Notably, homologs of bZIP28 were not included in the brown module, likely due to bZIP28’s moderate transcriptional response to ER stress and its regulation via posttranslational modifications.42 Among the 180 predicted ZmbZIP60 target genes, 29 were directly associated with ER stress pathways, reinforcing the biological significance of our findings (Figure S1D).

For the second test study, we analyzed the Harada-Goldberg soybean seed development laser capture microdissection RNA-seq dataset (STAR Methods), which includes nine data points throughout seed development (Figure S2A). Soybean seed development is tightly regulated by transcriptional networks involving various TFs, including GmbZIP67.46 We selected this dataset for its rich transcriptional dynamics and the availability of chromatin immunoprecipitation sequencing (ChIP-seq) data, enabling validation of our predicted TF-target relationships. Differential gene expression analyses across all 36 pairwise comparisons of the nine tissue samples identified 23,583 DEGs (Figure S2B; Table S2). WGCNA then revealed 31 coexpression modules, each with unique gene reprogramming patterns during seed development (Figure S2C; Table S2). We focused on the mediumpurple1 module due to its strong enrichment in the cellular ketone metabolic process (2.00 × 10−12), critical for seed development, and its inclusion of GmbZIP67 (Table S2). GENIE3 analysis predicted that, among the 17 TFs in the mediumpurple1 module, GmbZIP67 is the top-ranking TF (Figures 1C; Dataset S8). Among the 49 predicted targets from the 1,000 top TF-target predictions, 27 (55%) overlapped (2.42 × 10−25) with the experimentally validated GmbZIP67 targets identified by ChIP-seq46 (Figure S2D), demonstrating the accuracy and biological relevance of our approach.

Through these case studies, we demonstrate that NEEDLE effectively captures transcriptome dynamics and accurately identifies key TFs within specific GRNs. These TFs are crucial for responding to environmental stimuli and developmental processes linked to critical agronomic traits.

Selection of CSLF6 genes as the genes of interest for the NEEDLE application

MLG is a readily digestible polysaccharide of the cell wall primarily present in the Poaceae grass family.47,48 MLG has a high potential as an agricultural and industrial resource of easily digestible carbohydrates due to its unique features: glucose connected by β-(1,3) and β-(1,4) linkages without substitutions.49 Notably, MLG is mainly synthesized by a single enzyme, CSLF6, but the molecular mechanisms underlying the precise transcriptional regulation of CSLF6 genes remain largely elusive.48 A better understanding of CSLF6 regulation can facilitate the targeted engineering of bioenergy crops to enhance MLG levels without unintended growth or yield penalty, which is often accompanied by genetic engineering in crops.50,51,52 Thus, building upon the successful demonstration of NEEDLE in both monocot (maize) and dicot (soybean) systems (Figure 1B, 1C, S1, and S2), we applied NEEDLE to predicting TFs that regulate the expression of CSLF6 and validate their predicted transcriptional functions in planta concurrently in two grass species: Brachypodium, a well-established model cereal,53 and sorghum, an important source of grain and a key bioenergy crop.54 We reasoned that pursuing our search in these two plant species would allow us to assess NEEDLE’s broad applicability and delve deeper into the gene regulatory selection pressure among grasses. In line with previous findings that CSLF6 belongs to the multi-gene cellulose synthase family,49 our sequence homology analysis of CSLF6 revealed 20 homologs in Brachypodium and 35 homologs in sorghum (Figure 2A), identifying a high degree of conservation in protein sequence and structure between BdCSLF6 and SbCSLF6 (Figure 2B). Each Brachypodium homolog pairs with one of the sorghum homologs, underscoring the existence of additional sorghum homologs that may have been generated due to events of genome duplication.55 This observation prompted us to score the degree of conservation of the promoter sequences between BdCSLF6 and SbCSLF6 to identify whether they share identical or similar regulatory machinery. Therefore, we searched for conserved promoter sequences by aligning the 2-kb SbCSLF6 promoter sequence with corresponding regions from related grass species, including maize, switchgrass (Panicum virgatum), Setaria italica, rice (O. sativa), and barley (Hordeum vulgare), in addition to Brachypodium, using mVISTA analysis (Figure 2C). In the analysis, which spanned 80 million years of evolution,56 three regions (named R1, R2, and R3) were markedly conserved in maize, Setaria, switchgrass, and sorghum but not found in rice, Brachypodium, or barley. Phylogenetic analysis of the CSLF6 promoter sequences further supports this finding (Figure 2D). These conserved regions suggest a potential link between CSLF6 regulation and the photosynthetic pathways, as maize, Setaria, switchgrass, and sorghum are C4 species, while rice, Brachypodium, and barley are C3 species.57 This idea aligns with evidence that MLG accumulation is influenced by photosynthesis, with carbon-based metabolites accumulating during the day and degrading at night.58 However, further investigation is needed to confirm this crosstalk. Interestingly, two of these conserved regions (R1 and R2) overlapped with open chromatin regions analyzed by the assay for transposase-accessible chromatin using sequencing,59 strongly suggesting their potential interaction with TFs, which we aimed to identify using NEEDLE in this study.

Figure 2.

Figure 2

Selection of CSLF6 genes as the genes of interest for the NEEDLE application

(A) CSLF6 is a member of a large gene family exhibiting high protein sequence conservation in Brachypodium and sorghum. A neighbor-joining tree-based phylogenetic tree depicts CSLF6-like proteins in Brachypodium and sorghum. BdCSLF6 and SbCSLF6 are highlighted with red stars.

(B) Predicted protein structures of BdCSLF6 and SbCSLF6 using Alphafold modeling.

(C) Conservation and divergence of the CSLF6 promoter sequences across grasses, illustrated through mVISTA plots. Regions shaded in salmon indicate >70% sequence similarity across 30-bp windows. The orange solid line indicates the promoter fragment utilized in dual-LUC assays (Figure 4E). Gray areas indicate regions (R1, R2, and R3) conserved in all sorghum maize, switchgrass, and Setaria.

(D) A phylogenetic tree was constructed based on the sequence variations illustrated in (C).

Initial transcriptome analysis

To proceed with the NEEDLE pipeline (Figure 1A), we obtained two extensive RNA-seq datasets, each encompassing 11 diverse tissues that span the developmental trajectory of shoot and stem in Brachypodium or sorghum (Figures 3A and 3B). While these datasets did not originate from identical tissue types in the grasses, they offered insights into the dynamic transcriptomic changes that enabled a comparative exploration of the transcriptional networks in the two species through NEEDLE. We performed differential gene expression analyses within all possible 55 pairs of the 11 distinct tissue samples in each species, displaying massive transcriptomic waves associated with hundreds or thousands of DEGs across juvenile (1–2), mature (3–8), and senescence (9–11) stages (Figures 3C and 3D). These results illustrated a high level of transcriptomic dynamics in each dataset, which is required for NEEDLE. Our analyses identified 23,536 DEGs (representing 72% of all genes) in Brachypodium and 20,642 DEGs (encompassing 60.4% of all genes) in sorghum that were differentially expressed in at least one of the 55 comparisons (Tables S3 and S4). These observations not only highlighted varying numbers of DEGs across tissues within each species but also underscored a clear segregation between earlier-development-stage tissues and those at later stages, covering the vast developmental trajectory (Figures 3C and 3D). Such a divergence was particularly prominent in sorghum. Thus, the results from our initial transcriptome profiling analysis demonstrate significant transcriptomic reprogramming as tissues progressed through development and approached senescence in both Brachypodium and sorghum, supporting the high suitability of these two datasets for NEEDLE.

Figure 3.

Figure 3

Transcriptome analyses of Brachypodium and sorghum datasets, establishing cross-species networks involving BdCSLF6 or SbCSLF6

(A and B) Sample details of the transcriptome datasets used for Brachypodium (A) and sorghum (B) in this study.

(C and D) Heatmaps depict dynamic changes in transcriptomic profiles in the Brachypodium (C) and sorghum (D) samples. Each square cell indicates the count of DEGs with upregulated and downregulated expression along the horizontal and vertical axes, respectively. Sample descriptions for Brachypodium and sorghum align with (A) and (B), respectively.

(E) Coexpression modules containing either BdCSLF6 or SbCSLF6, extracted from the 21 (Brachypodium) and 18 (sorghum) modules identified via WGCNA (see Figure S2 for the details). Eigengene expression levels, serving as representative expressions, are displayed along the y axis, with sample points within each genotype marked along the x axis.

(F) A schematic overview of the cross-species networks. Detailed procedures are described in Figure S5.

(G) Intersection between BdCSLF6- and SbCSLF6-associated coexpression networks. The genes (white circles) interacting with BdCSLF6 (blue diamond) or/and SbCSLF6 (red diamond) are indicated by solid lines (edges).

Coexpression network modeling

In the subsequent stage of NEEDLE (Figure 1A), the expression data (log2-transformed FPKM + 1) for all DEGs (23,536 in Brachypodium and 20,642 in sorghum) across all tissues were employed to conduct WGCNA within each species. In the Brachypodium dataset, we identified 21 modules, and in the sorghum dataset, we identified 18 modules (Figure S3; Tables S3 and S4). These modules were color coded and exhibited varying sizes, ranging from as little as 32 genes (i.e., the aliceblue module in Brachypodium) to as many as 7,100 genes (i.e., the salmon module in sorghum). Each module displayed a unique expression pattern of eigengenes during development (Figure S3A and S3B) and demonstrated significant enrichment in distinct biological pathways (Figure S4; Tables S3 and S4). Notably, among these modules, the darkorange2 module in Brachypodium and the saddlebrown module in sorghum included BdCSLF6 and SbCSLF6, respectively. Therefore, we decided to focus our investigation on these two coexpression modules to fulfill the objective of identifying TFs controlling CSLF6 gene expression using NEEDLE. In accordance with the documented expression profiles of BdCSLF660 and SbCSLF6,61 both modules exhibited genes with expression that increased during early development, peaked at maturity, and gradually decreased in later stages of development (Figure 3E). Notably, the darkorange2 module contained Trihelix1 (BdTHX1), which has been shown to be coexpressed with BdCSLF6 and bind BdCSLF6’s intronic region,60 validating the robust performance of the coexpression network modeling. Despite a substantial correlation in gene expression between the two modules, it was observed that the number of DEGs in darkorange2 (Brachypodium, n = 4,808) was four times higher than in saddlebrown (sorghum, n = 1,235) (Figure S2; Tables S3 and S4). This observation implies that these two coexpression modules may encompass divergent genes between the two grasses despite the similar expression pattern, a hypothesis supported by the distinct enrichment of GO terms in each module (Figure S5; Tables S3 and S4).

Obtaining cross-species networks

Next, we undertook a data mining process within the two coexpression modules, aiming to establish cross-species networks containing solely genes conserved in both modules for comparative network analyses (Figures S6A and S6B). We reasoned that the resulting cross-species networks would allow us to perform comparative prediction of CSLF6 regulators and functional validations between the plant species. We proceeded as follows. First, the protein sequences of module genes (either darkorange2 or saddlebrown) were compared against the other genome using BLAST, creating a non-redundant list of “sorghum homologs of darkorange2 module Brachypodium genes” or “Brachypodium homologs of saddlebrown module sorghum genes.” Second, the resulting homolog genes intersected with the original genes in the other module (i.e., saddlebrown module sorghum genes or darkorange2 module Brachypodium genes). Third, the intersected genes were compared to the corresponding original BLASTP result, generated at the first step, to identify darkorange2 (or saddlebrown) module genes, of which sorghum (or Brachypodium) homologs were included in saddlebrown (or darkorange2). Eventually, the analyses resulted in the reformation of each module (darkorange2 in Brachypodium or saddlebrown in sorghum) into smaller modules containing 273 and 298 genes conserved in both species’ modules, respectively (Figure 3F). These cross-species networks were termed darkorange2.c and saddlebrown.c to distinguish them from their original counterparts. Consistent with sequence conservation but in contrast to the GO enrichment in the original modules (Figure S5), darkorange2.c and saddlebrown.c exhibited overlapping or highly similar enriched GO terms (Figures S6C and S6S). Interestingly, mapping the coexpression networks of BdCSLF6 and SbCSLF6 indicated only a limited gene overlap between the two networks (Figure 3G). This result suggests that, despite the similar gene expression phase and identical GO term annotations between darkorange2.c and saddlebrown.c, the regulatory landscape governing the CSLF6 expression has undergone divergence between Brachypodium and sorghum, consistent with our earlier finding of the diversification on the CSLF6 promoter sequences between Brachypodium and sorghum (Figures 2C and 2D). This observation was further substantiated by our coexpression network topology analyses (Figure S6E), which revealed that BdCSLF6 displayed a higher centrality (indicating its closeness to other nodes) and degree (reflecting the number of coexpression interactions) compared to the median value in darkorange2.c. Conversely, SbCSLF6 exhibited values close to the median for both network traits.

GRN construction

Next, we proceeded by investigating the hierarchical structure of the resulting cross-species networks through unsupervised GRN modeling using GENIE3 (Figure 1A). It is important to note that the number of TFs contained within darkorange2.c was identical (n = 11) to that in saddlebrown.c, mitigating any potential bias stemming from differences in the sizes of TF pools. Our GRN modeling analysis uncovered 531 TF-target interactions within darkorange2.c and 389 interactions within saddlebrown.c (Tables S3 and S4), indicating highly interconnected regulatory networks. In each GRN, two TFs were found to target the corresponding CSLF6 gene: basic-helix-loop-helix 62 (BdbHLH62) and KNOTTED-LIKE HOMEOBOX OF ARABIDOPSIS THALIANA 7 (BdKNAT7) for BdCSLF6 (Figure 4A), and SbbHLH63 and NAC SECONDARY WALL THICKENING PROMOTING FACTOR1 (SbNST1) for SbCSLF6 (Figure 4B). Intriguingly, these predicted regulators upstream of CSLF6 in each species displayed no sequence homology with their counterparts in the other species, supporting our finding that the regulatory landscape surrounding BdCSLF6 is divergent from the one of SbCSLF6 (Figures 2C, 2D, 3G, and S6E).

Figure 4.

Figure 4

Establishment of GRNs in each cross-species coexpression network and experimental validation of predicted regulatory relationships between TFs and CSLF6 genes

(A and B) Within each GRN of darkgrange2.c (A) or saddlebrown.c (B), circles represent module genes that encode non-TFs, and squares display ones that encode TFs. BdCSLF6 and SbCSLF6 are denoted in red, while their predicted regulating TFs are marked in blue. The regulatory relationship on each edge is represented by arrows.

(C) A detailed view of R1 and R2 of the SbCSLF6 promoter at the nucleotide level. Sequences highlighted in yellow indicate either R1 or R2.

(D) A schematic of the locations of CREs on the BdCSLF6 promoter. “+1” denotes the transcription start site.

(E) Dual-LUC assays conducted in Agrobacterium-infiltrated tobacco leaves. Plus signs indicate the addition of the respective Agrobacterium cell cultures with an optical density 600 of 0.2. The LUC signals normalized to the corresponding Renilla signals are shown on the y axis. Mean ± SEM, n = 7–8 biological replicates. One-way ANOVA followed by Duncan’s multiple-range test was employed to analyze the significance of differences (p < 0.05) among the multiple samples. Means with the same letter are not significantly different.

To further support the TF predictions within the GRNs, we revisited mVISTA plots of CSLF6 promoter sequences across evolutionarily diverse grasses (Figure 2C), as this analysis allowed us to scan the sequence conservation at the nucleotide level. By scrutinizing the evolutionarily conserved promoter sequences, R1 (−633 to −581 bp) and R2 (−94 to −76 bp), of CSLF6s in C4 species (Figure 2C), we uncovered multiple potential TF binding sites, including a core G-box sequence (CACGTG; the core sequence is underlined) known to be a binding site for bHLH TFs62 and a CTT-N9-ACG sequence recognized as a binding site for the Arabidopsis homolog of SbNST163 (Figures 4C and S7A). This result underscores the potentially pivotal role of these highly conserved promoter fragments in C4 species in regulating CSLF6 expression. Additionally, by scanning the promoter sequence of BdCSLF6, we identified multiple potential CREs, including a G-box within the 1-kb promoter and a TGAC sequence, known to be the binding site of KNOT TFs,64 on the BdCSLF6 promoter (Figures 4D and S7B). Therefore, collectively, our promoter analysis results strongly suggest that the predicted TFs may regulate the expression of CSLF6s potentially via the identified CREs.

Functional validation in planta

As the final step of NEEDLE (Figure 1A), we sought to validate the transcriptional activities of the predicted CSLF6 regulators (i.e., BdbHLH62, BdKNAT7, SbbHLH63, and SbNST1) on the expression of both BdCSLF6 and SbCSLF6 in planta. To do so, we conducted dual-LUC assays in tobacco leaves infiltrated with agrobacteria. This assay involved the generation of reporter constructs containing the 1.1-kb promoters of BdCSLF6 or SbCSLF6, driving the LUC gene (referred to as BdCSLF6pro:LUC and SbCSLF6pro:LUC, respectively), as well as four effector constructs that constitutively expressed each of the predicted TF genes (BdKNAT7, BdbHLH62, SbbHLH63, and SbNST1). We opted to use the 1.1 kb of the promoters from the transcription initiation sites, as this region encompasses R1 and R2 fragments containing the potential binding sites for the predicted TFs (Figures 4C, 4D, and S7). In addition to species-specific predictions, we hypothesized that these TFs regulate CSLF6 expression across species, based on the largely overlapping CSLF6 expression patterns observed during development in Brachypodium and sorghum (Figure 3E). The results showed that the addition of BdKNAT7 or BdbHLH62 effector cells significantly increased BdCSLF6pro:LUC activity compared to cells containing the empty effector vector (Figure 4E). Similarly, SbNST1, but not SbbHLH63, effector cells significantly boosted the activities of SbCSLF6pro:LUC. Interestingly, the addition of BdKANT7, but not BdbHLH62, effector cells also augmented the activity of SbCSLF6pro:LUC, while the addition of SbbHLH63 or SbNST1 effector cells increased the activity of BdCSLF6pro:LUC. Therefore, the provided experimental evidence not only validated the predicted functions of BdKNAT7, BdbHLH62, and SbNST1 in regulating the expression of their corresponding CSLF6 genes but also revealed regulatory conservation of trans-acting elements between Brachypodium and sorghum. The lack of SbCSLF6 promoter activity by SbbHLH63 and the cross-species activity of BdKANT7 and SbNST1 may be linked to the role of additional transcriptional regulators and/or CREs outside of the 1.1-kb promoter regions not included in our dual-LUC assays.

Discussion

Fast and precise identification of the causal genes and regulatory mechanisms behind important agronomic traits requires a coordinated effort between experimental and computational biology spanning various molecular levels.65 The recent influx of multi-omics data has revolutionized this intricate process, facilitating a comprehensive understanding at the systems level.10,16 However, the application of multi-omics approaches remains limited to a select few model plant species. Alternatively, the simulation of biological networks using transcriptome data, which is increasingly accessible in crops, has gained traction. Nonetheless, handling large transcriptome datasets, which are prone to noise, poses a formidable challenge, necessitating the development of robust computational pipelines to enable even non-expert crop researchers to effectively leverage this potent method. Furthermore, the validation of predicted TF functions is imperative before progressing to subsequent stages, such as generating transgenic lines or genetic mutants, highlighting the complexity inherent in the gene discovery process in crops, including issues related to data quality and gene function redundancy. Here, we present NEEDLE, a robust network-based gene discovery pipeline designed to identify upstream TFs of target genes by leveraging transcriptomic dynamics in non-model plants followed by rapid experimental validation in planta. NEEDLE streamlines data analysis by offering step-by-step instructions and guidance to help minimize noise and maximize prediction accuracy, requiring no extensive expertise. In proof-of-concept studies, we validated the efficacy of NEEDLE by accurately predicting ZmbZIP60 and GmbZIP67 as the top-ranked TF in the maize UPR-associated GRN and the soybean seed development GRN, respectively. This successful demonstration underscores the accuracy of the gene discovery process facilitated by NEEDLE. Building upon this proof of principle, we applied NEEDLE to uncover TFs potentially regulating the expression of CSLF6, which is necessary for the synthesis of MLG, which is beneficial for human health, agriculture, and the bioenergy industry.48,66 Our findings revealed two TFs upstream of BdCSLF6 and one TF upstream of SbCSLF6 in grasses, suggesting the bioengineering potential to enhance the growth, yield, and cell wall composition of sorghum. Several key considerations emerge from our study. First, NEEDLE’s minimalistic requirement of dynamic transcriptome datasets it apart from other bioinformatics pipelines,20,67 making it particularly suitable for crop research, where such data are increasingly abundant and accessible. Second, due to its adaptable modeling procedures facilitated by conceptual separation (coexpression, gene regulatory hierarchy, and cistrome), NEEDLE possesses the capacity for evolution through the integration of new modeling algorithms, consequently bolstering its predictive precision. Last, beyond predictions, NEEDLE offers an efficient means to validate predicted regulatory relationships in planta, utilizing a robust transient reporter system. In conclusion, the robustness and effectiveness of NEEDLE, as demonstrated in this study, suggest its broad applicability across non-model organisms and diverse biological pathways. It provides an efficient framework that integrates big data, machine learning-based modeling, and streamlined experimental validation to advance crop improvement through network-enabled approaches.

Limitations of the study

NEEDLE, as outlined here, is a workflow designed to facilitate gene discovery in non-model plant species using solely transcriptome data. However, it has some limitations. One key challenge is that, while NEEDLE can manage a large number of input genes (e.g., DEGs), it struggles with large sample sizes (e.g., more than 30 samples). Running DESeq2 on such a large dataset can be impractical, and WGCNA might produce overly complex networks, potentially obscuring the biological relevance of gene expression modules. Additionally, these algorithms can be computationally intensive with large sample sizes. Another limitation is that, as NEEDLE relies on coexpression network algorithms, it may not identify upstream TFs that exhibit opposite expression patterns to their target genes, potentially missing negative regulators of the genes of interest. Nevertheless, the inherent adaptability of NEEDLE means that ongoing refinements and the integration of complementary approaches could address these challenges. As computational power continues to advance and new methodologies emerge, NEEDLE’s framework is well positioned to evolve and enhance its capabilities, further improving gene discovery and functional analysis in diverse plant species.

Resource availability

Lead contact

Requests for further information, resources, and reagents should be directed to and will be fulfilled by the lead contact, Federica Brandizzi (fb@msu.edu).

Materials availability

DNA constructs generated by this study are available under a materials transfer agreement.

Data and code availability

  • All data supporting the findings of this study are available within this paper and its supplemental information. The full results of WGCNA and GENIE3 are available in supplemental tables.

  • The code for NEEDLE is available on GitHub (https://github.com/DaeKwan-Ko/needle). An archival DOI is listed in the key resources table.

  • Any additional information required to reproduce the data reported in this paper is available from the lead contact upon request.

Acknowledgments

This study was supported primarily by the Great Lakes Bioenergy Research Center, US Department of Energy, Office of Science, Biological and Environmental Research Program under award DE-SC0018409 with contributing support from the National Institutes of Health (R35GM136637); the Chemical Sciences, Geoscience and Biosciences Division, Office of Basic Energy Sciences, Office of Science, US Department of Energy (DE-FG02-91ER20021); and MSU AgBioResearch (MICL02598).

Author contributions

D.K.K. and F.B. conceived the project and designed the experiments and research plan. D.K.K. performed experiments and data analysis. F.B. supervised the project. D.K.K. and F.B. interpreted the data. D.K.K. wrote the manuscript draft. D.K.K. and F.B edited the manuscript.

Declaration of interests

The authors declare no competing interests.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Bacterial and virus strains

Agrobacterium tumefaciens strain GV3101 This study N/A

Biological samples

Tobacco (Nicotiana benthamiana) This study N/A

Deposited data

Maize RNA-seq dataset Srivastava et al.39 NCBI GEO: GSE111251
Soybean RNA-seq dataset N/A NCBI GEO: GSE57606, GSE46096, GSE99109
Brachypodium RNA-seq dataset Fan et al.60 NCBI SRA: PRJNA419776
Sorghum RNA-seq dataset McKinley et al.61 Joint Genome Institute Genome Portal: 1190875

Oligonucleotides

SbCSLF6 primer forward (restriction enzyme, SalI, site is underlined: GTCGACAAT
CCTTCTTGTTCCCGGAAC
This study N/A
SbCSLF6 primer reverse (restriction enzyme, BamHI, site is underlined: GGATCCGAAGG
AAGAAGGTTGAGCTGT
This study N/A
BdCSLF6 primer forward (restriction enzyme, SalI, site is underlined: GTCGACGCTTCT
GCCTCAAGTGAATGCC
This study N/A
BdCSLF6 primer reverse (restriction enzyme, BamHI, site is underlined: GGATCCGG
CCGCCGTCCCAATCTC
This study N/A

Recombinant DNA

BdKNAT7 CDS cloned into pUC-GW-Kan This study N/A
BdbHLH62 CDS cloned into pUC-GW-Kan This study N/A
SbbHLH63 CDS cloned into pUC-GW-Kan This study N/A
SbNST1 CDS cloned into pUC-GW-Kan This study N/A
pGreenII 62-SK Creative Biogene VET1291
pGreenII 0800-LUC Creative Biogene VET1290

Software and algorithms

Cutadapt Martin et al.68 https://cutadapt.readthedocs.io/en/stable/
FastQC N/A https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Bowtie2 Langmead and Salzberg69 https://bowtie-bio.sourceforge.net/bowtie2/index.shtml
TopHat2 Kim et al.70 https://ccb.jhu.edu/software/tophat/index.shtml
STAR Dobin et al.71 https://github.com/alexdobin/STAR
Samtools Li et al.72 https://samtools.sourceforge.net/
HTSeq Putri et al.73 https://htseq.readthedocs.io/en/latest/
mVISTA Frazer et al.33 https://genome.lbl.gov/vista/index.shtml
DESeq2 Love et al.74 https://bioconductor.org/packages/release/bioc/html/DESeq2.html
WGCNA Langfelder and Horvath27 https://cran.r-project.org/web/packages/WGCNA/index.html
GENIE3 Huynh-Thu et al.31 http://github.com/vahuynh/GENIE3
R N/A https://www.r-project.org
R studio N/A https://posit.co/download/rstudio-desktop/
R package - ComplexHeatmap Gu et al.75 https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html
R package - clusterProfiler Yu et al.76 https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html
R package - enrichplot N/A https://www.bioconductor.org/packages/release/bioc/html/enrichplot.html
Cytoscape Shannon et al.77 https://cytoscape.org
AgriGo Tian et al.28 https://systemsbiology.cau.edu.cn/agriGOv2/
NEEDLE This paper https://doi.org/10.5281/zenodo.14560598

Experimental model and study participant details

Tobacco plantlets (Nicotiana benthamiana) were propagated through cuttings and cultivated in soil pots (3.5 × 3.5 × 3.5 inches, L × W × D) under controlled conditions of 22°C with 16 h light/8 h dark photoperiod. To minimize positional bias, the pots were randomly rotated every two days. One-month-old plants, typically having 3–5 young leaves optimal for infiltration, were selected for Dual-LUC assays.

Method details

Multiple sequence alignment and phylogenetic analysis

We searched CSLF homologs against BdCSLF6 or SbCSLF6 protein sequence using BLASTP from the Brachypodium (Brachypodium distachyon v3.2) and sorghum (Sorghum bicolor v3.1.1) genome in the Phytozome database (version 13; https://phytozome-next.jgi.doe.gov), requiring an E-value threshold of 1.0 × 10−20. We removed isoforms and kept the longest one among them. The analyses discovered 20 proteins from Brachypodium and 35 proteins from sorghum. The sequences of all homolog proteins were aligned in the CLUSTALW algorithm and then used to construct the phylogenetic tree with the neighbor-joining method using R packages msa and ape (version 5.6–2). The phylogenetic tree was visualized using R packages, ggplot2 (version 3.4.0), and ggtree (version 3.6.2).78

RNA-seq analyses

For the test studies, we obtained the maize UPR RNA-seq dataset39 from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (GSE111251) and the Harada-Goldberg Soybean Seed Development Laser Capture Microdissection RNA-seq Dataset from the GEO database (GSE57606, GSE46096, and GSE99109). For the CSLF6 study, we obtained the RNA-seq datasets for the stem development of Brachypodium60 from the BioProject (accession ID PRJNA419776), and sorghum61 from Joint Genome Institute Genome Portal (accession ID, 1190875). We processed all three datasets in the same analysis pipeline as described below. The quality of raw reads was accessed using FastQC (version 0.11.5). Raw reads were cleaned for quality and adapters with Cutadapt (version 1.8.1)68 using a minimum base quality of 20 retaining reads with a minimum length of 30 nucleotides after trimming. Quality-filtered reads were aligned to the corresponding reference genome (Zea mays RefGen_V4, Brachypodium distachyon v3.2, or Sorghum bicolor v3.1.1) using Bowtie (version 2.2.4)69 and TopHat (version 2.0.14),70 except for the soybean reads which were mapped to the genome (Glycine max Wm82.a6.v1) using STAR (version 2.7).71 Both aligners are known to yield similar alignment outputs.71,79 Per-gene read counts were measured using HTSeq (version 0.6.1p1)80 in the union mode with a minimum mapping quality of 20 with stranded = reverse counting. Differential gene expression analysis was performed with pairwise comparisons of samples for the soybean, Brachypodium, and sorghum, and by comparing each ER stress time point data relative to the mock control for maize using DESeq2 (version 1.16.1)74 within R (version 4.1.2). Genes of which the total count is <100 across the samples were not included in the analysis. DEGs were obtained based on adjusted p-value <0.01 and absolute Log2FC > 1, except for the soybean data where Log2FC > 2 was applied due to the high level of gene expression changes overall. GO enrichment analysis was performed using clusterProfiler (version 4.2.2)81 and visualized using enrichplot (version 1.14.2) in R (version 4.1.2) or using AgriGo (version 2.0)28 and visualized using ComplexHeatmap75 in R (version 4.1.2).

Coexpression network analyses

All DEGs obtained from the DESeq2 analysis, described above, with the strict criteria (total read count across samples >100, adjusted p < 0.01 and absolute Log2-transformed fold change >1) were considered. Averaged values of Log2-transformed FPKM+1 from the biological replicates served as inputs. Coexpression network was built using the WGCNA R package27 as described below. Through empirical optimizations, a soft-power threshold of 18 (the maize, soybean, and Brachypodium data) or 16 (the sorghum data) was selected to create a signed network of a Spearman correlated matrix. Topographical overlap matrices (TOMs) were constructed using TOMsimilarityFromExpr() function with default parameter. The TOM scores were used as edge weights in the analysis. Coexpression modules were constructed through hierarchical clustering the TOM distance using flashClust() function with (method = “average”). Modules were derived using the cutreeDynamic() function with 25 of minimum module size, and similar modules were merged into a single module using mergeCloseModules() function with (cutHeight = 0.05). Finally, we applied a cut-off of the weight parameter (0.2) to obtain high-confident sets for downstream analyses. We previously provided the detailed step-by-step instruction for WGCNA in a GitHub depository (https://github.com/DaeKwan-Ko/WGCNA).82

Cross-species network construction

A visual summary of the entire process of cross-species network construction is provided in Figure S5. To create darkorange2.c, we performed BLASTP of proteins encoded by darkorgange2 module genes (n = 4,808) against the sorghum genome in the Phytozome database (version 13; https://phytozome-next.jgi.doe.gov) to create a non-redundant list of sorghum homologs of darkorgange2 module genes (n = 4,170). We then checked the significant overlap (p = 9.22 × 10−18) for sorghum homologs of darkorange2 module genes with saddlebrown module genes (n = 1,235) and extracted those sorghum genes overlapped (n = 255). From the original list of darkorange2 module genes (n = 4,808), we identified genes of which sorghum homologs were included in the 255 overlapped genes, which generated a list of 273 genes out of the 4808 genes. To create saddlebrown.c, we performed BLASTP of proteins encoded by saddlebrown module genes (n = 804) against the Brachypodium genome in the Phytozome database (version 13; https://phytozome-next.jgi.doe.gov) to create a non-redundant list of Brachypodium homologs of saddlebrown module genes (n = 4,170). As described above, we observed the significant overlap (p = 4.40 × 10−24) for Brachypodium homologs of saddlebrown module genes with darkorange2 module genes (n = 4521) and extracted those sorghum genes overlapped (n = 287). From the original list of saddlebrown module genes (n = 1,235), we identified genes of which Brachypodium homologs were included in the 287 overlapped genes, which generated a list of 298 genes out of the 1,235 genes. Following the creation of darkorange2.c and saddlebrown.c, we retrieved pairwise coexpression interactions among only the cross-species module genes (273 genes for darkorange2.c; 298 genes for saddlebrown.c) from the original list of coexpression. To analyze the network traits, we loaded coexpression network data into the Cytoscape software (version 3.6.1)77 and performed the network analyses using the built-in tool of Analyze Network. After analyses, we downloaded the outputs directly from Cytoscape and visualized the data using ggplot2 in R (version 4.1.2).

GRN construction

For GRN construction in darkorange2.c and saddlebrown.c, we prepared averaged values of log2-transformed FPKM+1 from the biological replicates served as inputs. We downloaded the full list of TFs in Brachypodium and sorghum from the Plant TF Database (http://planttfdb.gao-lab.org/index.php). Among the full list, we found 11 TFs in each of darkorange2.c and saddlebrown.c. With the expression profiles and the lists of TFs, we constructed GRNs using GENIE331 with a threshold of 0.15.

Network visualization

We visualized all networks constructed in this study using the Cytoscape software (version 3.6.1).77 We previously provided detailed step-by-step instructions for Cytoscape in a previously published method paper.82

Promoter analyses

We searched homologs of SbCSLF6 using BLASTP from Zea mays (RefGen_V4), Setaria italica (version 2.2), Panicum virgatum (version 5.1), Hordeum vulgare (r1), and Brachypodium distachyon (version 3.2), which also served as a positive control. We kept the ones that had lowest E-value and longest among the isoforms. Then, we obtained the 2-kb promoter sequence of each homolog gene from Biomart (https://phytozome-next.jgi.doe.gov/biomart). We aligned the 2-kb promoter sequences to that of SbCSLF6 using mVISTA LAGAN alignment (https://genome.lbl.gov/vista/mvista/submit.shtml).33 The plot represents alignment windows of 30-bp at a similarity threshold of 70%, highlighted in salmon. We obtained the sorghum ATAC-seq data from Plant Epigenome Browser (http://epigenome.genetics.uga.edu/PlantEpigenome/)59 and visualized the leaf rep 1 in Integrative Genome Browser (version 2.5.0).83

Plasmid construction for Dual-LUC assay

Gene accession IDs used in Dual-LUC assays are as follows: BdCSLF6 (Bradi3g16307), BdKNAT7 (Bradi1g76970), BdbHLH62 (Bradi5g06620), SbCSLF6 (Sobic.007G050600), SbNST1 (Sobic.010G022400), and SbbHLH63 (Sobic.005G110490). For the reporter construct, the promoter fragments of BdCSLF6 (−840-bp to +317-bp) and SbCSLF6 (−1,145-bp to -1-bp) were amplified from Bd21-3 and BT×623 genomic DNA, respectively, using Phusion High-Fidelity DNA Polymerase (New England BioLabs) with primers tailed with restriction enzyme sites: SalI (forward) and BamHI (reverse). Each promoter fragment was cloned into a pGEM-T easy vector (Promega). After sequence verification, it was subcloned into pGreenII 0800-LUC through restriction enzyme sites, SalI (forward) and BamHI (reverse), generating the promoter-reporter constructs (BdCSLF6pro:LUC and SbCSLF6pro:LUC, respectively). For effector constructs, the CDS of BdKNAT7, BdbHLH62, SbbHLH63, and SbNST1 were synthesized (GENEWIZ) based on the reference genome sequence with primers tailed with restriction enzyme sites: NotI (forward) and EcoRI (reverse). After sequence verification, it was subcloned into pGreenII 62-SK through restriction enzyme sites, NotI (forward) and EcoRI (reverse), generating the reporter constructs (35Spro:BdKNAT7, 35Spro:BdbHLH62, 35Spro:SbbHLH63, and 35Spro:SbNST1). The created reporter and effector plasmids were introduced into Agrobacterium tumefaciens strain GV3101 along with pSOUP. The transformed cells were plated on LB agar media with rifampicin (25 μg/mL), kanamycin (50 μg/mL), and gentamicin (25 μg/mL). The list of primers used in the promoter cloning is provided in the key resources table.

Dual-LUC assay

One single colony of A. tumefaciens cells transformed with either the reporter (BdCSLF6pro:LUC and SbCSLF6pro:LUC) or one of the effectors (35Spro:BdKNAT7, 35Spro:BdbHLH62, 35Spro:SbbHLH63, and 35Spro:SbNST1) was inoculated in 5 mL LB media with kanamycin (50 μg/mL), and gentamicin (100 μg/mL) at 28°C overnight. 1 mL of the overnight culture was inoculated in 25 mL LB media with kanamycin (50 μg/mL), and gentamicin (100 μg/mL) at 28°C overnight. The overnight culture was centrifuged at 1,500 × g for 10 min and the pellet was resuspended in 10 mL Resuspension Solution (10 mM MgCl2 and 10 mM MES-K pH 5.6). The centrifugation step was repeated to remove traces of antibiotics. A. tumefaciens cell cultures transformed with each of the constructs were mixed and adjusted to have 0.1 of the final OD600 according to the experiment strategy. After being incubated in the dark at room temperature for 24 h before infiltration, the cell suspension was infiltrated into the abaxial surface of 4-week-old tobacco leaves using disposable 1 mL syringes. After growth for 72 h, 1 cm leaf discs (one disc for each replicate) were harvested, snap-frozen, and ground to a fine powder in liquid nitrogen using a Retch MM400 Mixer Mill with zirconium oxide balls. Ground tissue powder of each leaf disc was homogenized with 300 μL of the Passive Lysis Buffer (PLB) provided in the Dual-Luciferase Assay kit (Promega). Homogenized samples were centrifuged at 7,500 × g for 1 min. The supernatant was diluted 5-fold in PLB and ready for the assay. 30 μL of each sample was loaded into a well of Nun F96 MicroWell White Polystyrene Plate (Thermo Fisher Scientific). The plate was loaded into the GloxMax Navigator luminometer (Promega). The dual injectors were used to dispense 75 μL of luciferase assay reagent and Stop & Glo reagent into each well, respectively. The relative activity of luciferase was normalized by the intensity of the internal control, Renilla. Each biological replicate was measured in three technical replicates.

Quantification and statistical analysis

Statistical analyses

Statistical calculations were conducted using R (version 4.1.2). Statistical analyses were performed using a two-tailed Student’s t test, one-way ANOVA followed by Duncan’s multiple range test, and hypergeometric probability test. The exact sample sizes (n) and all raw data for each experimental group/condition are given as discrete numbers in each figure panel.

Published: January 27, 2025

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.crmeth.2024.100963.

Supplemental information

Document S1. Figures S1–S7 and Table S1
mmc1.pdf (447.5KB, pdf)
Table S2. Processed results of soybean and maize datasets, related to Figure 1
mmc2.xlsx (4MB, xlsx)
Table S3. Processed results of the Brachypodium dataset, related to Figures 2, 3, and 4
mmc3.xlsx (3.5MB, xlsx)
Table S4. Processed results of the sorghum dataset, related to Figures 2, 3, and 4
mmc4.xlsx (3.2MB, xlsx)
Document S2. Article plus supplemental information
mmc5.pdf (4.7MB, pdf)

References

  • 1.Rosin F.M., Kramer E.M. Old dogs, new tricks: regulatory evolution in conserved genetic modules leads to novel morphologies in plants. Dev. Biol. 2009;332:25–35. doi: 10.1016/j.ydbio.2009.05.542. [DOI] [PubMed] [Google Scholar]
  • 2.Kaufmann K., Pajoro A., Angenent G.C. Regulation of transcription in plants: mechanisms controlling developmental switches. Nat. Rev. Genet. 2010;11:830–842. doi: 10.1038/nrg2885. [DOI] [PubMed] [Google Scholar]
  • 3.Zhu J.K. Abiotic Stress Signaling and Responses in Plants. Cell. 2016;167:313–324. doi: 10.1016/j.cell.2016.08.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wittkopp P.J., Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 2011;13:59–69. doi: 10.1038/nrg3095. [DOI] [PubMed] [Google Scholar]
  • 5.Lemon B., Tjian R. Orchestrated response: a symphony of transcription factors for gene control. Genes Dev. 2000;14:2551–2569. doi: 10.1101/gad.831000. [DOI] [PubMed] [Google Scholar]
  • 6.Gao C. Genome engineering for crop improvement and future agriculture. Cell. 2021;184:1621–1635. doi: 10.1016/j.cell.2021.01.005. [DOI] [PubMed] [Google Scholar]
  • 7.Allwright M.R., Taylor G. Molecular Breeding for Improved Second Generation Bioenergy Crops. Trends Plant Sci. 2016;21:43–54. doi: 10.1016/j.tplants.2015.10.002. [DOI] [PubMed] [Google Scholar]
  • 8.Bailey-Serres J., Parker J.E., Ainsworth E.A., Oldroyd G.E.D., Schroeder J.I. Genetic strategies for improving crop yields. Nature. 2019;575:109–118. doi: 10.1038/s41586-019-1679-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bonneau R. Learning biological networks: from modules to dynamics. Nat. Chem. Biol. 2008;4:658–664. doi: 10.1038/nchembio.122. [DOI] [PubMed] [Google Scholar]
  • 10.Ko D.K., Brandizzi F. Network-based approaches for understanding gene regulation and function in plants. Plant J. 2020;104:302–317. doi: 10.1111/tpj.14940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Li Y., Pearl S.A., Jackson S.A. Gene Networks in Plant Biology: Approaches in Reconstruction and Analysis. Trends Plant Sci. 2015;20:664–675. doi: 10.1016/j.tplants.2015.06.013. [DOI] [PubMed] [Google Scholar]
  • 12.Meyer C.A., Liu X.S. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat. Rev. Genet. 2014;15:709–721. doi: 10.1038/nrg3788. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Stark R., Grzelak M., Hadfield J. RNA sequencing: the teenage years. Nat. Rev. Genet. 2019;20:631–656. doi: 10.1038/s41576-019-0150-2. [DOI] [PubMed] [Google Scholar]
  • 14.Conesa A., Madrigal P., Tarazona S., Gomez-Cabrero D., Cervera A., McPherson A., Szcześniak M.W., Gaffney D.J., Elo L.L., Zhang X., Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. doi: 10.1186/s13059-016-0881-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Michael T.P., VanBuren R. Building near-complete plant genomes. Curr. Opin. Plant Biol. 2020;54:26–33. doi: 10.1016/j.pbi.2019.12.009. [DOI] [PubMed] [Google Scholar]
  • 16.Komili S., Silver P.A. Coupling and coordination in gene expression processes: a systems biology view. Nat. Rev. Genet. 2008;9:38–48. doi: 10.1038/nrg2223. [DOI] [PubMed] [Google Scholar]
  • 17.Usadel B., Obayashi T., Mutwil M., Giorgi F.M., Bassel G.W., Tanimoto M., Chow A., Steinhauser D., Persson S., Provart N.J. Co-expression tools for plant biology: opportunities for hypothesis generation and caveats. Plant Cell Environ. 2009;32:1633–1651. doi: 10.1111/j.1365-3040.2009.02040.x. [DOI] [PubMed] [Google Scholar]
  • 18.Ho T.K. IEEE; 1995. Random Decision Forests; pp. 278–282. [Google Scholar]
  • 19.Huynh-Thu V.A., Sanguinetti G. Tree-Based Learning of Regulatory Network Topologies and Dynamics with Jump3. Methods Mol. Biol. 2019;1883:217–233. doi: 10.1007/978-1-4939-8882-2_9. [DOI] [PubMed] [Google Scholar]
  • 20.Shanks C.M., Huang J., Cheng C.Y., Shih H.J.S., Brooks M.D., Alvarez J.M., Araus V., Swift J., Henry A., Coruzzi G.M. Validation of a high-confidence regulatory network for gene-to-NUE phenotype in field-grown rice. Front. Plant Sci. 2022;13 doi: 10.3389/fpls.2022.1006044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Cassan O., Lèbre S., Martin A. Inferring and analyzing gene regulatory networks from multi-factorial expression data: a complete and interactive suite. BMC Genom. 2021;22:387. doi: 10.1186/s12864-021-07659-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ueda Y., Ohtsuki N., Kadota K., Tezuka A., Nagano A.J., Kadowaki T., Kim Y., Miyao M., Yanagisawa S. Gene regulatory network and its constituent transcription factors that control nitrogen-deficiency responses in rice. New Phytol. 2020;227:1434–1452. doi: 10.1111/nph.16627. [DOI] [PubMed] [Google Scholar]
  • 23.Huynh-Thu V.A., Sanguinetti G. Gene Regulatory Network Inference: An Introductory Survey. Methods Mol. Biol. 2019;1883:1–23. doi: 10.1007/978-1-4939-8882-2_1. [DOI] [PubMed] [Google Scholar]
  • 24.Ko D.K., Brandizzi F. Advanced genomics identifies growth effectors for proteotoxic ER stress recovery in Arabidopsis thaliana. Commun. Biol. 2022;5:16. doi: 10.1038/s42003-021-02964-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lanver D., Müller A.N., Happel P., Schweizer G., Haas F.B., Franitza M., Pellegrin C., Reissmann S., Altmüller J., Rensing S.A., Kahmann R. The biotrophic development of Ustilago maydis studied by RNA-seq analysis. Plant Cell. 2018;30:300–323. doi: 10.1105/tpc.17.00764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Greenham K., Guadagno C.R., Gehan M.A., Mockler T.C., Weinig C., Ewers B.E., McClung C.R. Temporal network analysis identifies early physiological and transcriptomic indicators of mild drought in. Elife. 2017;6 doi: 10.7554/eLife.29655. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Tian T., Liu Y., Yan H., You Q., Yi X., Du Z., Xu W., Su Z. agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res. 2017;45:W122–W129. doi: 10.1093/nar/gkx382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Van Bel M., Diels T., Vancaester E., Kreft L., Botzki A., Van de Peer Y., Coppens F., Vandepoele K. PLAZA 4.0: an integrative resource for functional, evolutionary and comparative plant genomics. Nucleic Acids Res. 2018;46:D1190–D1196. doi: 10.1093/nar/gkx1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Maere S., Heymans K., Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005;21:3448–3449. doi: 10.1093/bioinformatics/bti551. [DOI] [PubMed] [Google Scholar]
  • 31.Huynh-Thu V.A., Irrthum A., Wehenkel L., Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One. 2010;5 doi: 10.1371/journal.pone.0012776. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Tian F., Yang D.C., Meng Y.Q., Jin J., Gao G. PlantRegMap: charting functional regulatory maps in plants. Nucleic Acids Res. 2020;48:D1104–D1113. doi: 10.1093/nar/gkz1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Frazer K.A., Pachter L., Poliakov A., Rubin E.M., Dubchak I. VISTA: computational tools for comparative genomics. Nucleic Acids Res. 2004;32:W273–W279. doi: 10.1093/nar/gkh458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Bailey T.L. STREME: Accurate and versatile sequence motif discovery. Bioinformatics. 2021;37:2834–2840. doi: 10.1093/bioinformatics/btab203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Sherf B.A., Navarro S.L., Hannah R.R., Wood K.V. Dual-luciferase reporter assay: an advanced co-reporter technology integrating firefly and Renilla luciferase assays. Promega Notes. 1996;57:2–8. [Google Scholar]
  • 36.Page D.R., Grossniklaus U. The art and design of genetic screens: Arabidopsis thaliana. Nat. Rev. Genet. 2002;3:124–136. doi: 10.1038/nrg730. [DOI] [PubMed] [Google Scholar]
  • 37.Shimamoto K., Kyozuka J. Rice as a model for comparative genomics of plants. Annu. Rev. Plant Biol. 2002;53:399–419. doi: 10.1146/annurev.arplant.53.092401.134447. [DOI] [PubMed] [Google Scholar]
  • 38.Rai A., Yamazaki M., Saito K. A new era in plant functional genomics. Curr. Opin. Syst. Biol. 2019;15:58–67. [Google Scholar]
  • 39.Srivastava R., Li Z., Russo G., Tang J., Bi R., Muppirala U., Chudalayandi S., Severin A., He M., Vaitkevicius S.I., et al. Response to Persistent ER Stress in Plants: A Multiphasic Process That Transitions Cells from Prosurvival Activities to Cell Death. Plant Cell. 2018;30:1220–1242. doi: 10.1105/tpc.18.00153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Park C.J., Park J.M. Endoplasmic Reticulum Plays a Critical Role in Integrating Signals Generated by Both Biotic and Abiotic Stress in Plants. Front. Plant Sci. 2019;10:399. doi: 10.3389/fpls.2019.00399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Walter P., Ron D. The unfolded protein response: from stress pathway to homeostatic regulation. Science. 2011;334:1081–1086. doi: 10.1126/science.1209038. [DOI] [PubMed] [Google Scholar]
  • 42.Ko D.K., Brandizzi F. Dynamics of ER stress-induced gene regulation in plants. Nat. Rev. Genet. 2024;25:513–525. doi: 10.1038/s41576-024-00710-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Li Z., Tang J., Srivastava R., Bassham D.C., Howell S.H. The Transcription Factor bZIP60 Links the Unfolded Protein Response to the Heat Stress Response in Maize. Plant Cell. 2020;32:3559–3575. doi: 10.1105/tpc.20.00260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Li Y., Humbert S., Howell S.H. ZmbZIP60 mRNA is spliced in maize in response to ER stress. BMC Res. Notes. 2012;5:144. doi: 10.1186/1756-0500-5-144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Ko D.K., Brandizzi F. A temporal hierarchy underpins the transcription factor-DNA interactome of the maize UPR. Plant J. 2021;105:254–270. doi: 10.1111/tpj.15044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jo L., Pelletier J.M., Hsu S.W., Baden R., Goldberg R.B., Harada J.J. Combinatorial interactions of the LEC1 transcription factor specify diverse developmental programs during soybean seed development. Proc. Natl. Acad. Sci. USA. 2020;117:1223–1232. doi: 10.1073/pnas.1918441117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Pauly M., Keegstra K. Cell-wall carbohydrates and their modification as a resource for biofuels. Plant J. 2008;54:559–568. doi: 10.1111/j.1365-313X.2008.03463.x. [DOI] [PubMed] [Google Scholar]
  • 48.Kim S.J., Brandizzi F. Advances in Cell Wall Matrix Research with a Focus on Mixed-Linkage Glucan. Plant Cell Physiol. 2021;62:1839–1846. doi: 10.1093/pcp/pcab106. [DOI] [PubMed] [Google Scholar]
  • 49.Burton R.A., Wilson S.M., Hrmova M., Harvey A.J., Shirley N.J., Medhurst A., Stone B.A., Newbigin E.J., Bacic A., Fincher G.B. Cellulose synthase-like CslF genes mediate the synthesis of cell wall (1,3;1,4)-beta-D-glucans. Science. 2006;311:1940–1942. doi: 10.1126/science.1122975. [DOI] [PubMed] [Google Scholar]
  • 50.Kim S.J., Zemelis-Durfee S., Jensen J.K., Wilkerson C.G., Keegstra K., Brandizzi F. In the grass species Brachypodium distachyon, the production of mixed-linkage (1,3;1,4)-β-glucan (MLG) occurs in the Golgi apparatus. Plant J. 2018;93:1062–1075. doi: 10.1111/tpj.13830. [DOI] [PubMed] [Google Scholar]
  • 51.Ko D.K., Nadakuduti S.S., Douches D.S., Buell C.R. Transcriptome profiling of transgenic potato plants provides insights into variability caused by plant transformation. PLoS One. 2018;13 doi: 10.1371/journal.pone.0206055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Kim P., Xue C.Y., Song H.D., Gao Y., Feng L., Li Y., Xuan Y.H. Tissue-specific activation of DOF11 promotes rice resistance to sheath blight disease and increases grain weight via activation of SWEET14. Plant Biotechnol. J. 2021;19:409–411. doi: 10.1111/pbi.13489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Girin T., David L.C., Chardin C., Sibout R., Krapp A., Ferrario-Méry S., Daniel-Vedele F. Brachypodium: a promising hub between model species and cereals. J. Exp. Bot. 2014;65:5683–5696. doi: 10.1093/jxb/eru376. [DOI] [PubMed] [Google Scholar]
  • 54.Rooney W.L., Blumenthal J., Bean B., Mullet J.E. Designing sorghum as a dedicated bioenergy feedstock. Biofuel. Bioprod. Biorefin. 2007;1:147–157. [Google Scholar]
  • 55.Swigonova Z., Lai J., Ma J., Ramakrishna W., Llaca V., Bennetzen J.L., Messing J. On the tetraploid origin of the maize genome. Comp. Funct. Genomics. 2004;5:281–284. doi: 10.1002/cfg.395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.McSteen P., Kellogg E.A. Molecular, cellular, and developmental foundations of grass diversity. Science. 2022;377:599–602. doi: 10.1126/science.abo5035. [DOI] [PubMed] [Google Scholar]
  • 57.Schuler M.L., Mantegazza O., Weber A.P.M. Engineering C4 photosynthesis into C3 chassis in the synthetic biology age. Plant J. 2016;87:51–65. doi: 10.1111/tpj.13155. [DOI] [PubMed] [Google Scholar]
  • 58.Fan M., Jensen J.K., Zemelis-Durfee S., Kim S.J., Chan J.Y., Beaudry C.M., Brandizzi F., Wilkerson C.G. Disruption of Brachypodium lichenase alters metabolism of mixed-linkage glucan and starch. Plant J. 2022;109:927–939. doi: 10.1111/tpj.15603. [DOI] [PubMed] [Google Scholar]
  • 59.Lu Z., Marand A.P., Ricci W.A., Ethridge C.L., Zhang X., Schmitz R.J. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat. Plants. 2019;5:1250–1259. doi: 10.1038/s41477-019-0548-z. [DOI] [PubMed] [Google Scholar]
  • 60.Fan M., Herburger K., Jensen J.K., Zemelis-Durfee S., Brandizzi F., Fry S.C., Wilkerson C.G. A Trihelix Family Transcription Factor Is Associated with Key Genes in Mixed-Linkage Glucan Accumulation. Plant Physiol. 2018;178:1207–1221. doi: 10.1104/pp.18.00978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.McKinley B., Rooney W., Wilkerson C., Mullet J. Dynamics of biomass partitioning, stem gene expression, cell wall biosynthesis, and sucrose accumulation during development of Sorghum bicolor. Plant J. 2016;88:662–680. doi: 10.1111/tpj.13269. [DOI] [PubMed] [Google Scholar]
  • 62.Ezer D., Shepherd S.J.K., Brestovitsky A., Dickinson P., Cortijo S., Charoensawan V., Box M.S., Biswas S., Jaeger K.E., Wigge P.A. The G-Box Transcriptional Regulatory Code in Arabidopsis. Plant Physiol. 2017;175:628–640. doi: 10.1104/pp.17.01086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.O'Malley R.C., Huang S.S.C., Song L., Lewsey M.G., Bartlett A., Nery J.R., Galli M., Gallavotti A., Ecker J.R. Cistrome and Epicistrome Features Shape the Regulatory DNA Landscape. Cell. 2016;165:1280–1292. doi: 10.1016/j.cell.2016.04.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Bolduc N., Yilmaz A., Mejia-Guerra M.K., Morohashi K., O'Connor D., Grotewold E., Hake S. Unraveling the KNOTTED1 regulatory network in maize meristems. Genes Dev. 2012;26:1685–1690. doi: 10.1101/gad.193433.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Alvarez J.M., Brooks M.D., Swift J., Coruzzi G.M. Time-Based Systems Biology Approaches to Capture and Model Dynamic Gene Regulatory Networks. Annu. Rev. Plant Biol. 2021;72:105–131. doi: 10.1146/annurev-arplant-081320-090914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Collins H.M., Burton R.A., Topping D.L., Liao M.L., Bacic A., Fincher G.B. Variability in fine structures of noncellulosic cell wall polysaccharides from cereal grains: potential importance in human health and nutrition. Cereal Chem. 2010;87:272–282. [Google Scholar]
  • 67.Wilkins O., Hafemeister C., Plessis A., Holloway-Phillips M.M., Pham G.M., Nicotra A.B., Gregorio G.B., Jagadish S.V.K., Septiningsih E.M., Bonneau R., Purugganan M. EGRINs (Environmental Gene Regulatory Influence Networks) in Rice That Function in the Response to Water Deficit, High Temperature, and Agricultural Environments. Plant Cell. 2016;28:2365–2384. doi: 10.1105/tpc.16.00158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  • 69.Langmead B., Salzberg S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Kim D., Pertea G., Trapnell C., Pimentel H., Kelley R., Salzberg S.L. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., 1000 Genome Project Data Processing Subgroup The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Putri G.H., Anders S., Pyl P.T., Pimanda J.E., Zanini F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. Bioinformatics. 2022;38:2943–2945. doi: 10.1093/bioinformatics/btac166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Gu Z., Eils R., Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 2016;32:2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
  • 76.Yu G., Wang L.G., Han Y., He Q.Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS. 2012;16:284–287. doi: 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B., Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Yu G. Using ggtree to Visualize Data on Tree-Like Structures. Curr. Protoc. Bioinformatics. 2020;69:e96. doi: 10.1002/cpbi.96. [DOI] [PubMed] [Google Scholar]
  • 79.Corchete L.A., Rojas E.A., Alonso-López D., De Las Rivas J., Gutiérrez N.C., Burguillo F.J. Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Sci. Rep. 2020;10 doi: 10.1038/s41598-020-76881-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Anders S., Pyl P.T., Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Wu T., Hu E., Xu S., Chen M., Guo P., Dai Z., Feng T., Zhou L., Tang W., Zhan L., et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation. 2021;2 doi: 10.1016/j.xinn.2021.100141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Ko D.K., Brandizzi F. Coexpression Network Construction and Visualization from Transcriptomes Underlying ER Stress Responses. Methods Mol. Biol. 2023;2581:385–401. doi: 10.1007/978-1-0716-2784-6_27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Robinson J.T., Thorvaldsdóttir H., Winckler W., Guttman M., Lander E.S., Getz G., Mesirov J.P. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S7 and Table S1
mmc1.pdf (447.5KB, pdf)
Table S2. Processed results of soybean and maize datasets, related to Figure 1
mmc2.xlsx (4MB, xlsx)
Table S3. Processed results of the Brachypodium dataset, related to Figures 2, 3, and 4
mmc3.xlsx (3.5MB, xlsx)
Table S4. Processed results of the sorghum dataset, related to Figures 2, 3, and 4
mmc4.xlsx (3.2MB, xlsx)
Document S2. Article plus supplemental information
mmc5.pdf (4.7MB, pdf)

Data Availability Statement

  • All data supporting the findings of this study are available within this paper and its supplemental information. The full results of WGCNA and GENIE3 are available in supplemental tables.

  • The code for NEEDLE is available on GitHub (https://github.com/DaeKwan-Ko/needle). An archival DOI is listed in the key resources table.

  • Any additional information required to reproduce the data reported in this paper is available from the lead contact upon request.


Articles from Cell Reports Methods are provided here courtesy of Elsevier

RESOURCES