Abstract
Maize (Zea mays) seeds are a good source of protein, despite being deficient in several essential amino acids. However, eliminating the highly abundant but poorly balanced seed storage proteins has revealed that the regulation of seed amino acids is complex and does not rely on only a handful of proteins. In this study, we used two complementary omics-based approaches to shed light on the genes and biological processes that underlie the regulation of seed amino acid composition. We first conducted a genome-wide association study to identify candidate genes involved in the natural variation of seed protein-bound amino acids. We then used weighted gene correlation network analysis to associate protein expression with seed amino acid composition dynamics during kernel development and maturation. We found that almost half of the proteome was significantly reduced during kernel development and maturation, including several translational machinery components such as ribosomal proteins, which strongly suggests translational reprogramming. The reduction was significantly associated with a decrease in several amino acids, including lysine and methionine, pointing to their role in shaping the seed amino acid composition. When we compared the candidate gene lists generated from both approaches, we found a nonrandom overlap of 80 genes. A functional analysis of these genes showed a tight interconnected cluster dominated by translational machinery genes, especially ribosomal proteins, further supporting the role of translation dynamics in shaping seed amino acid composition. These findings strongly suggest that seed biofortification strategies that target the translation machinery dynamics should be considered and explored further.
An integrated approach reveals the key role of translational machinery in maize kernel amino acid natural variation and homeostasis, highlighting targets for seed amino acid biofortification.
Introduction
Cereal grains are an important food source for humans and livestock worldwide. Maize (Zea mays ssp. mays), wheat (Triticum sp.), and rice (Oryza sativa) account for ∼70% of total cereal production (Shewry and Halford, 2002; Shewry, 2007). Of these three cereals, maize is the most productive crop in terms of yield per acreage; the United States alone produced around 400 million tons in 2018 (FAOSTAT, 2018). Maize seeds, or kernels, consist of a large endosperm (∼90% of the total dry seed weight) and a small embryo (∼10% of the total dry seed weight) (Watson, 2003; Flint-Garcia et al., 2009). Although maize kernels are dominated by carbohydrates—roughly 70% of their composition is starch and ∼10% is protein (Watson, 2003; Flint-Garcia et al., 2009; Wu and Messing, 2014)—both humans and livestock, especially in developing countries, rely heavily on maize as a protein source (Shiferaw et al., 2011; Shen and Roesler, 2017). This dietary reliance is problematic since maize kernels are deficient in several essential amino acids (EAA), that is, those amino acids that humans and livestock cannot synthesize in their bodies and must obtain from their diet.
Previous studies have posited that this deficiency in EAAs in seed is due to the abundance of seed storage proteins (SSPs). The reason is that SSPs can contribute to >60% of a seed’s total amino acid composition but are very poor in several EAAs (Messing, 1983; Larkins et al., 1984; Shewry and Halford, 2002; Shewry, 2007; Larkins, 2017). The most abundant SSPs in maize kernels are the zeins, which are members of the prolamin group (Larkins and Hurkman, 1978; Boston and Larkins, 2009; Larkins, 2017). During seed development, zeins are deposited in the endosperm in specific protein bodies (Lending and Larkins, 1989). While rich in proline (Pro) and glutamine (Gln), zeins lack several EAAs including lysine (Lys), methionine (Met), and tryptophan (Trp; Shewry, 2007; Larkins, 2017). Attempts have been made to boost the EAA composition of kernels by greatly reducing or completely eliminating the majority of the zeins, but these efforts have not yielded any significant improvements. On the contrary, studies of various zein mutants have revealed that seeds largely maintain their protein levels and protein-bound amino acid (PBAA) composition in response to large perturbations to their proteome (Schmidt et al., 2011; Wu et al., 2012; Morton et al., 2015). This natural phenomenon, which is termed proteomic rebalancing, is a highly conserved mechanism in seeds, having been reported not only in maize (Morton et al., 2015) but also soybean (Glycine max; Schmidt et al., 2011), Arabidopsis thaliana (Withana-Gamage et al., 2013), camelina (Schmidt and Pendarvis, 2017), and wheat (Altenbach et al., 2014). Neither the underlying molecular mechanism nor the natural function of proteomic rebalancing is understood. Nonetheless, the phenomenon strongly implies a greater complexity to the metabolic regulation of a seed’s PBAA levels and composition than just the SSP levels.
Quantitative genetic approaches have proven to be efficacious for interrogating the genetic architecture of complex traits, including PBAAs and their regulatory genetic mechanisms. Genome-wide association studies (GWASs) have proven to be a powerful tool for uncovering metabolic quantitative trait loci (mQTL) that underlie the natural variation of metabolic traits (Angelovici et al., 2013; Wu et al., 2016; Deng et al., 2017; Slaten et al., 2020a, 2020b) and, consequently, facilitating the identification of key regulatory genes as well as targets for marker-assisted selection for breeding. A GWAS of seed PBAAs measured from 79 wild soybean accessions, for example, elucidated the genetic basis that underlies the deficiency of essential sulfur amino acids and led to identification of several QTLs involved in the natural variation of aspartic acid (Asp) and Gln (La et al., 2019). Likewise, a large meta-analysis of soybean seed composition identified 156 QTLs related to PBAAs, including a number of potential candidate genes related to their metabolic pathways (Van and McHale, 2017). Surprisingly, very few GWAS of PBAAs measured from maize kernels were conducted. Deng et al. (2017) conducted an extensive association and linkage mapping study of maize PBAAs measured from 513 lines of a Chinese diversity panel that led to the identification of 247 and 281 significant QTLs in two different environments (Deng et al., 2017). The authors subsequently identified additional QTLs via a linkage mapping of three RIL populations, but an in-depth analysis was performed on only three candidate genes.
A large number of candidate genes can be one of the drawbacks of GWAS. Often, it is impractical to validate or prioritize the candidate genes using classical genetic approaches. Furthermore, the candidate genes often are involved in the natural variation of traits in a very specific combination of environments and timepoints, and/or developmental stages. While some candidate genes identified by GWAS are key regulatory genes that belong to a specific relevant biological process, it can be hard to differentiate these genes from other candidate genes relevant only to the specific conditions under which the plants were grown. An integrative multiomics approach can help overcome this challenge.
Approaches that integrate GWAS with orthogonal genome-scale data, such as transcript profiling or proteomic datasets, have been shown to help detect gene-trait associations and reduce false positives associations (Hawkins et al., 2010; Chan et al., 2011). The first reported network-assisted GWAS study focused on human multiple sclerosis (Baranzini et al., 2009) followed by several other studies that have taken a similar approach (Akula et al., 2011; Jia et al., 2011; Garcia-Alonso et al., 2012). Networks also have been used to successfully map causal genes with GWAS or QTL studies and to help with candidate gene prioritization (see Kliebenstein, 2020, for a review), including in plants. Chan et al. (2011), for example, combined GWAS with transcriptional networks to identify genes controlling glucosinolates in Arabidopsis, which allowed the authors to rapidly identify and validate a large number of unexpected genes that affect defense metabolism. Similarly, by combining GWAS with correlation networks of transcripts and metabolites taken from a stress time-course study in Arabidopsis, Wu et al. (2016) identified ∼90 candidate associations between structural genes and primary metabolites, some of which had been previously characterized, and validated two genes (Wu et al., 2016). Similar integrative omics strategies also have been successfully used to prioritize and validate genes related to senescence and ionone in maize (Schaefer et al., 2018; Sekhon et al., 2019). In soybean, a meta-analysis that integrated metabolic and transcriptomic data, GWAS, and association mapping was used to identify genes responsible for seed compositional traits, including amino acids (Qi et al., 2018). In this example, the authors used a weighted gene coexpression network analysis (WGCNA) to find an association between coexpression modules from RNA sequencing data taken during seed development and a seed compositional trait. The highly connected genes from the associated coexpression modules (i.e. hub genes) were then compared to previously identified mQTLs for the traits. Several of the hub genes matched the identified mQTLs, supporting their key role in the compositional trait of interest.
It stands to reason that coupling GWAS with a proteomic expression network analysis would be biologically more relevant than gene expression for a trait such as seed PBAA composition, which ultimately reflects proteome dynamics. Such an approach is rarely used, however, since proteomic analyses can be expensive and hard to generate. Here, we took advantage of a relatively affordable, high-throughput shotgun proteomic methodology to overcome this limitation. We performed a GWAS on PBAAs measured from mature, dry maize kernels taken from a 282-association panel and generated a candidate gene list. We then performed an association analysis between the protein coexpression trend and the PBAA composition dynamics quantified from a developmental series of B73 plants to generate an orthogonal candidate gene list. From a comparison of these two gene lists, we identified 80 high-confidence candidate genes (HCCGs) that were both strongly associated with seed PBAA composition during development/maturation and involved in the natural variation of these traits in dry kernels. A downstream functional analysis and comprehensive ranking of these 80 genes showed that our approach was very efficient in identifying both characterized and, more importantly, previously uncharacterized biological processes involved in seed PBAA composition. Our analysis revealed surprising insights about the role that the translational machinery genes, especially ribosomal proteins, play in the seed amino acid composition. We suggest these proteins as avenues to explore for seed PBAA biofortification.
Results
Seed PBAA absolute levels and relative composition displayed different relationships and heritability
The PBAA content and composition of dry maize kernels were measured from 279 inbred lines that belonged to a 282 line maize diversity panel (Flint-Garcia et al., 2005). Two replicates of this panel were grown in 2017 and again in 2018. Fifteen PBAAs were quantified from the dry kernels. Due to the extraction method, the 15 PBAAs represent 17 PBAAs, since asparagine (Asn) and Gln hydrolyze to Asp and glutamic acid (Glu) and are denoted as Asx and Glx, respectively, and Trp and cysteine (Cys) were destroyed and glycine (Gly) was difficult to detect (see “Materials and Methods” and Yobi and Angelovici (2018) (Supplemental Data Set S1).
We found considerable natural variation in most PBAAs, both in terms of absolute levels (e.g. alanine [Ala], isoleucine [Ile], serine [Ser], threonine [Thr], tyrosine [Tyr]), and in relative composition (Supplemental Table S1, A and B; Figure 1, A and B). A relative composition trait is defined as the ratio of an individual amino acid to the sum of the 15 measured amino acids (e.g. Ala/Total, Ile/Total). We categorized the natural variation of these traits into three groups: Group 1 were those with PBAA levels >10%, Group 2 were those with PBAA levels between 10% and 2%, and Group 3 were with those with PBAA levels <2% (Supplemental Table S1B; Figure 1B). In general, the broad sense heritabilities of the PBAA absolute traits were moderate to high (>0.5); the exceptions were arginine (Arg) and Glx (Glu + Gln), which each showed low heritability (0.05 and 0.33, respectively; Supplemental Table S1A). Interestingly, the broad-sense heritability values of many PBAA relative compositions were substantially lower than their absolute levels (Supplemental Table S1, A and B).
Figure 1.
The natural variation and relationships of PBAA traits measured from the diversity panel. Boxplot showing the PBAA absolute levels (A) and relative compositional distribution (B) in the 279 taxa from the Goodman–Buckler maize association panel; T stands for sum total of 15 PBAA. The bold line in the center of the boxplot represents median while the lower and upper edges represent the 25th and 75th quartiles, respectively. The whiskers extend from the edges to the most extreme data points that are no more than 1.5× the length of the upper and lower quartiles. Pairwise Pearson correlation analysis between the absolute PBAA levels (C) and relative composition (D) using back transformed BLUPs of 279 taxa from the Goodman–Buckler maize association panel. The correlation matrix was visualized in R version 3.4.3 (R Core Team). Each dot represents a significant correlation coefficient (r) at qFDR values <0.05. Blue dots indicate positive correlation, and red dots indicate negative correlations. Asx denotes Asn + Asp; Glx denotes Gln + Glu. Bracketed numbers on (A) and (B) represent groups based on PBAA absolute levels (A) and PBAA/TPBAA ratios (B) where 1 is for PBAA levels >10%; 2 is for PBAA levels between 10% and 2%; and 3 is for PBAA levels <2%.
We performed a pairwise Pearson correlation analysis to evaluate the relationship between PBAA absolute levels and relative composition. For absolute PBAA, all pairwise correlations were significant at false discovery rate (qFDR)-values <0.05 and were exclusively positive (Figure 1C; Supplemental Table S2, A and B). The strongest correlation was between Ala and leucine (Leu; Pearson correlation coefficient [r] = 0.96) absolute levels, whereas the weakest correlation was between Lys and Leu (r = 0.27) absolute levels (Figure 1C; Supplemental Table S2A). Notably, both Met and Lys demonstrated a relatively larger number of weaker pairwise correlations with the other PBAAs (Figure 1C; Supplemental Table S2A). The pairwise correlation analysis showed a very different pattern for PBAA relative compositions. For this trait, we found both positive and negative correlations and numerous nonsignificant correlations among the relative PBAAs (Figure 1D; Supplemental Table S2, C and D). The strongest positive correlation was between Lys/Total and Arg/Total (r = 0.74), and the strongest negative correlation was between Lys/Total and Leu/Total (r = −0.83; Figure 1D; Supplemental Table S2C). In general, only a few PBAA relative compositions, including those of Leu, Lys, Arg, histidine (His), and valine (Val), had multiple moderate to strong correlations. Overall, we found that PBAA absolute levels were largely heritable and had strong positive correlations and that PBAA relative compositions were less heritable and had low to moderate negative and positive correlations.
To assess the relationship between the PBAA-related traits and seed weight, we performed a Pearson correlation between the 50-kernel weight and the 76 PBAA traits (Supplemental Data Set S2). Of the 76 traits, 48 were significantly correlated with seed weight at a significance level of 0.05, with 32 traits being negatively correlated and the remaining 12 positively correlated (Supplemental Data Set S3). The correlations were low, ranging from an absolute value of 0.07–0.24 (Supplemental Data Set S3), which is consistent with protein comprising to a small (∼10%) proportion of kernel composition (Watson, 2003; Flint-Garcia et al., 2009; Wu and Messing, 2014).
GWAS of PBAA related traits resulted in 1,399 potential candidate genes
To capture the breadth of the genetic architecture underlying this natural variation in PBAAs, we performed a GWAS on 76 PBAA-related traits. These traits included: (1) the absolute levels in nanomole per milligram; (2) the relative composition of each trait represented as the ratio of each PBAA to the total sum of the PBAAs measured (i.e. Lys/total); and (3) metabolic ratios based on the potential relationship within each amino acid metabolic family (e.g. Lys/the Asp family pathway: Ile + Met + Thr + Asx + Lys). A full list of the 76 traits is provided in Supplemental Table S3, and the calculated values are elaborated in Supplemental Data Set S1. We describe a similar use of amino acid-derived traits for GWAS in (Angelovici et al., 2016; Deng et al., 2017). For simplicity, we use the one-letter code to describe the metabolic family ratio-related traits (Supplemental Table S3).
GWAS was performed using both the FarmCPU model (Liu et al., 2016) and the GAPIT MLM model (Lipka et al., 2013). Results from the latter model were not significant; therefore, we present only the results from FarmCPU. We used FarmCPU previously to successfully identify key genes involved in free amino acid (FAA) metabolism in Arabidopsis (Slaten et al., 2020a, 2020b). Overall, 40 traits (out of the 76) yielded significant single-nucleotide polymorphism (SNP)-trait associations at 5% false discovery rate (FDR) correction (Benjamini and Hochberg, 1995) and 277 unique (i.e. nonredundant) SNPs (Supplemental Data Set S4). Figure 2A shows the partitioning of the 277 unique SNPs across the 10 maize chromosomes. A visualization of the distribution of the significant SNP-trait associations, using a 1 Mb window, shows several potential hotspots on chromosomes 1, 2, 6, 7, and 9 (Supplemental Figure S1B). Since the SNPs tend to be clustered at the tail ends rather than being evenly distributed across the chromosomes (Supplemental Figure S1A), this could have led to the unequal distribution of significant GWAS SNPs across the maize genome (Supplemental Figure S1B). A PBAA trait categorical summary of the GWAS results by amino acid family is in Table 1. Associations were detected at a comparable rate between the various trait categories except for the shikimate family, which was significantly smaller than the other ones (Table 1).
Figure 2.
The genomic distribution of the significant unique SNPs found by GWAS and the functional categorization of the extracted candidate. A, The partition of the significant unique SNPs across the 10 chromosomes in maize. B, Pie chart represents the functional categorization of the 1,399 GWAS candidate genes using MapMan version 3.6. The percentage in parentheses represents the proportion of genes that falls into a functional category. The top four categories are highlighted in dark orange and include protein (metabolism), RNA, signaling, and transport.
Table 1.
A summary of the 76 PBAA GWAS results
| PBAA Trait Category | Total Number of Traits Analyzed |
Number of Significant Traits |
Number of Unique SNPs |
Average SNPs Per Trait |
Unique Candidate Genes |
||||
|---|---|---|---|---|---|---|---|---|---|
| n | Percentage of total (%) | n | Percentage of total | n | Percentage of total (%) | n | n | Percentage of total (%) | |
| Absolute Levels | 15 | 20 | 5 | 12.5 | 36 | 12.9 | 7 | 219 | 16 |
| Relative Composition | 14 | 18 | 8 | 20.0 | 60 | 21.6 | 8 | 317 | 23 |
| Aspartate Family Related Ratios | 15 | 20 | 8 | 20.0 | 72 | 25.9 | 9 | 304 | 22 |
| BCAA Family-Related Ratios | 14 | 18 | 7 | 17.5 | 38 | 13.7 | 5 | 210 | 15 |
| Glutamate Family-Related Ratios | 13 | 17 | 9 | 22.5 | 61 | 22.3 | 7 | 296 | 21 |
| Shikimate Family-Related Ratios | 5 | 7 | 3 | 7.5 | 10 | 3.6 | 3 | 53 | 4 |
| Total Sum of the PBAA | 76 | 100 | 40 | 100 | 277 | 100 | 39 | 1,399 | 100 |
Data are summarized by PBAA trait category: PBAA absolute levels, relative composition, aspartate family-related ratios, BCAA family-related ratios, glutamate family-related ratios, and shikimate family-related ratio traits. The table presents the absolute number (n) and the percentage (%) per family out of the sum of the PBAA for the following trait parameters: total number of traits analyzed for the GWAS, number of significant traits, number of unique (nonredundant) SNPs, average SNP per trait per family, and number of unique candidate genes from a 200 kb window of the peak SNP.
We identified 1,399 unique candidate genes from 200 kb intervals centered around the significant SNPs identified for the 40 traits (100 kb upstream and 100 kb downstream) (Supplemental Data Set S4), as was described in previous studies and to compensate for a low marker coverage (Ching et al., 2002; Flint-Garcia et al., 2003; Yan et al., 2009; Deng et al., 2017). We detected six candidate genes that were associated with the highest number of traits and that were also the most significant associations. These genes were extracted from the same SNP and included two SSP proteins, GRMZM2G138689 (50 kD γ-zein), and GRMZM2G138727 (27 kD γ-zein) (Supplemental Data Set S4).
We next asked whether any specific biological processes or pathways were enriched across the 1,399 candidate genes. An enrichment analysis using agriGo (Tian et al., 2017) found no enrichments. We next used MapMan version 3.6 (Lohse et al., 2014) to assess the functional categorization of the genes (Figure 2B) and found that the four top functional categories were protein (13.6%), RNA (10.5%), signaling (4.6%), and transport (4.0%; Figure 2B).
In sum, the GWAS identified 1,399 candidate genes. To whittle this list down, we chose to generate an orthogonal candidate gene list with which to compare these genes.
The dynamics of PBAA relative compositions during kernel development and maturation highlighted opposing patterns
We generated an orthogonal candidate gene list associated with seed PBAA regulation and homeostasis by performing a correlation analysis of PBAA composition with protein coexpression modules during seed development and maturation. We collected developing B73 kernels from three biological replicas at 10 time points; kernels were collected every 4 d, starting 10 d after pollination (DAP) through desiccation (46 DAP-dry kernels). At 10 DAP, kernels transition to storage compounds accumulation, especially SSPs (Sabelli and Larkins, 2009; Larkins, 2017).
For each sample, we quantified the PBAA levels and calculated the relative composition using the same methods described above (Supplemental Data Set S5). Hierarchical clustering was used to cluster the trends of the relative compositions across the 10 developmental stages (Figure 3). We chose to analyze only the trends of relative composition as we reasoned they are the most relevant to compare/associate with protein expression that are measured per equal protein content.
Figure 3.
Seed PBAA composition dynamics during maturation. A, Heatmap and (B) expression trends of the PBAA relative compositions across 10 seed filling time points of maize inbred B73. The average values of three biological replicates from each time point were scaled and used to create the heatmap (n = 3) using hierarchical clustering. Blue indicates low values for PBAA accumulation, and red indicates high accumulations. The red line in (B) indicates the average expression pattern of individual PBAA accumulation within a cluster.
The patterns of PBAA relative composition aligned well with the known SSP composition and accumulation trends, which resulted in elevation of the branched-chain amino acid (BCAA) content and reductions in Lys and Met (Gorissen et al., 2018). Our analysis of PBAA relative composition trends resulted in four clusters. Cluster 1 (all the BCAAs and Tyr) was characterized by a gradual but continual elevation in relative composition with a peak at seed desiccation. Cluster 2 (Ser, Pro, and Thr) was characterized by an initial reduction in relative composition followed by an increase after 18 DAP. Cluster 3 (Met, Ala, Asx, and Glx) was characterized by a continual but gradual decrease in relative composition. And, finally, Cluster 4 (Arg, phenylalanine, Lys, Gly, and His) showed an initial increase (until 18 DAP) followed by a decrease in relative composition (Figure 3, A and B).
Five protein coexpression modules were highly associated with PBAA compositional dynamic patterns
In addition to PBAA quantification, we analyzed protein expression levels from each sample using a shotgun proteomic approach (see “Experimental Procedures”). We identified 6,361 proteins and then removed those with low spectral counts and poor reproducibility (see “Experimental procedures”), leaving 2,648 proteins with good quality data (Supplemental Data Set S6). We performed a WGCNA (Langfelder and Horvath, 2008) on these filtered proteins and then constructed an undirected and weighted protein coexpression network using the optimum soft threshold (Supplemental Figure S2). Notably, our digestion and detection method filtered out most of the highly abundant alpha (α) and beta (β) zeins, allowing us both to get a deeper coverage of the remaining seed proteome and to skip the exclusion of these proteins using protein fractionation. The 2,648 proteins were assigned to eight modules: blue, turquoise, brown, green, yellow, black, red, and gray (Figure 4A). A list of proteins in each module with their respective annotations is in (Supplemental Data Set S7). The turquoise module had the most proteins (853 proteins), and the black module had the least (66 proteins; Figure 4A; Supplemental Data Set S7). Visualization of the expression pattern of eigen protein of a particular module indicated that, as the kernel matured, proteins from the blue and turquoise modules decreased (45% of all proteins detected) and proteins from the brown and green modules increased (Figure 4B). Thus, almost half of all proteins analyzed declined during seed development/maturation (Figure 4). We calculated the Eigengene-based module connectivity, or module membership (kME), for each particular protein within a given module. A full list of proteins and their kME within a respective module is in (Supplemental Data Set S7).
Figure 4.
Relationships among protein coexpression modules and PBAA compositional dynamics during seed maturation. A, Module–trait relationships from the WGCNA analysis. Module names and number of proteins in the module are displayed on the left y-axis (e.g. MEblue denotes module eigen protein for the blue module comprising 356 proteins). The relative PBAA composition traits (e.g. Ala/T, which is the ratio of Ala/Sum total of all PBAA levels) are displayed on the x-axis. Each cell shows the correlation coefficients between modules Eigen protein (ME)-PBAA traits (top number) and the corresponding P-value (bottom number in parentheses). The module–trait relationships are colored based on their correlation: red is a strong positive correlation, and green is a strong negative correlation. B, Expression trend of Eigen protein found in the corresponding modules across the seed development time points. The x-axis is the 10 time-points, in DAP. The y-axis is the expression of module eigen protein using the scaled spectral count of protein (scaled SP).
Next, we assessed the functional relationships between the seed PBAA relative composition trends, and the protein expression patterns across the seed developmental stages. Here, we used WGCNA to perform a correlation analysis between the relative PBAA composition traits and the protein expression modules. To create a modules-PBAA composition association, we calculated the module Eigen (ME) protein, the first principal component of a given module. The correlation between the ME of each module and the PBAA composition trait is shown in Figure 4A. Five modules (blue, turquoise, brown, green, and black) correlated highly (r = ∼0.56–0.94) with multiple PBAA compositional traits. The gray, yellow, and red modules had mostly low and nonsignificant correlations with the PBAA traits (Figure 4A).
We used AgriGO_V2 (Tian et al., 2017) to carry out a GO enrichment analysis on the proteins identified from the five modules that were highly correlated with the PBAA relative traits (Supplemental Table S4); we did not find significant GO enrichment for proteins in the green module. The most significant enrichments were found for proteins in the turquoise module, and the terms included structural molecule activity, structural constituent of ribosome, ribosomal subunit, and translation (corrected P-values ranged from 1.80E-28 to 2.30E-19). The blue module was enriched for the term purine ribonucleoside metabolic process (corrected P-value 0.013; Supplemental Table S4). The most enriched categories for the brown and black modules were response to light intensity (corrected P-value 3.90E-05) and carbohydrate biosynthetic process (corrected P-value 0.00017), respectively (Supplemental Table S4). Altogether, we identified 1,583 candidate proteins that were highly correlated with the PBAA compositional dynamics during kernel development and maturation. We converted these candidate proteins into their respective gene ID using the STRING Z. mays database (Szklarczyk et al., 2019), providing us with the orthogonal gene list with which to compare to the gene list generated by GWAS.
80 HCCGs were identified by intersecting the GWAS and WGCNA candidate gene lists
We compared our two orthogonal candidate gene lists and searched for genes that were highly associated with our PBAA traits in both approaches. The logic of this approach is as follows: if overlap in the lists is not random, then genes that are both highly associated with PBAA across development and are also part of the genetic architecture of PBAA in the dry kernel are key regulatory genes. A comparison between the GWAS and WGCNA candidate gene lists yielded 80 overlapping genes (Figure 5). We refer to these as HCCGs (Supplemental Table S5).
Figure 5.

Comparison between the candidate genes lists generated by WGCNA and GWAS. Venn diagram depicting the 80 genes that overlap with both the GWAS candidate gene list and the relevant proteomic coexpression modules (turquoise, brown, green, black, and blue). We refer to the overlapping genes as HCCGs.
We tested our assumption that these genes resulted from a nonrandom overlap between the GWAS and WGCNA analyses by performing 10,000 overlap simulations in GeneOverlap (Shen, 2014). We extracted a random subset of genes without replacement with the same number of genes as the GWAS candidate gene list (i.e. 1,399 genes) and overlapped them with the WGCNA gene list. We used the AGP_V2 maize genome annotation, which consists of 39,656 genes. This analysis showed that 80 genes are a larger number than what would be expected (P = 7e-04) using a Fisher’s Exact test, which supports our assumption that the overlap was not random.
A protein–protein interaction network analysis of the 80 HCCG showed a large, tightly interconnected functional cluster containing mostly protein synthesis-related genes
We were interested in the potential functional interaction between the 80 HCCG proteins. We constructed and visualized a protein–protein interaction (PPI) network associated with the 80 HCCG by using the Search Tool for the Retrieval of Interacting Genes/Proteins database STRING (V11.0; Szklarczyk et al., 2019). The PPI network generated by STRING consisted of 80 nodes (42 connected at least to one other protein and 38 were unconnected) and 91 edges with an average node degree of 2.27 (Figure 6A). Each node represents an HCCG, and each edge represents the interaction between nodes/HCCG. The number of edges is larger than is expected for a random network of the same size (P-value 6.58e-05), indicating that these interactions are not random.
Figure 6.
PPI of the 80 HCCG list. A, A PPI of the 80 HCCG was created using STRING version 11.0. HCCG are indicated by nodes labeled with the encoding protein symbol from STRING. Interactions between nodes are indicated by edges. Smooth line edges indicate intracluster interactions, and the dotted edges indicate intercluster interactions (turquoise were based on curated database, pink were based on experimentally determined and black were based on coexpression). Cluster analysis using MCL algorithm resulted in 11 distinct clusters. B, Table representation of cluster numbers, cluster color, gene count within each cluster from STRING, and the bin name/functional category of the clusters from MapMan version 3.6.0.
To further investigate the functional interaction among the nodes, we used the MCL clustering algorithm in STRING, which resulted in 11 clusters (Figure 6A). The genes and clusters are summarized in Supplemental Table S5. Cluster 1, which contained 19 genes, was the largest and most interconnected functional cluster. The other clusters had only two to three genes. Using MapMan (Lohse et al., 2014) to assess the functional categorization of the 11 clusters (Figure 6, A and B; Supplemental Table S5), we found that the majority of genes in Cluster 1 belong to protein synthesis, degradation, or folding. Two genes are categorized as stress genes, but could be involved in protein folding as they are chaperones. Strikingly, of these 19 genes in Cluster 1, nine were ribosomal gene subunits. Cluster 2 (yellowish-green) included three zein storage proteins; Cluster 3 (green) included TCA and electron transport chain genes; and the remaining clusters included genes related to carbohydrate, amino acid, lipid metabolism, cell wall degradation, and cell vesical transport (Figure 6B; Supplemental Table S5).
We wanted to investigate the relationship between protein and gene expression for genes from Cluster 1, so we performed expression visualization of both dataset types (Figure 7, A and B). The proteomic data were the same data used for the WGCNA (Figure 7A), and the gene expression data (Figure 7B) were extracted from a public dataset comprised of developing B73 kernels from 0 to 38 DAP from whole kernels and dissected embryo and endosperm tissues (Chen et al., 2014). The majority of the Cluster 1 proteins were highly expressed at 10 DAP, and then gradually but continually decreased throughout the remainder of seed development (Figure 7A). The exceptions were two heat shock proteins that demonstrated the opposite trend (Gene Numbers 1 and 2 from Figure7E). When we visualized the gene expression patterns of the same genes using the Chen et al. (2014) dataset (Figure 7B), we observed that expression patterns of most proteins diverged from the gene expression patterns, especially toward late maturation and desiccation. While proteins showed a consistent decrease, gene expression was elevated toward these late stages (Figure 7, A and B). The latter observation suggests that transcript elevation is not manifested in translation of these proteins (Figure 7, A and B).
Figure 7.
Protein and gene expression for the HCCG from Cluster 1. A, Heatmap of 19 genes in Cluster 1 (red) obtained from the 80 HCCG PPIs. B, Average protein expression pattern across the 10 seed developmental stages of B73 obtained from shotgun proteomic sequencing from the current study. Y-axis is the scaled data for spectral counts of proteins (scaled SP) and X-axis the DAP. (C) Heatmap of the same 19 genes created using the gene expression data across eight seed developmental stages of B73 obtained from Chen et al. (2014) in the same order as (A). D, Average gene expression patterns of the 19 genes across the eight seed developmental stages of B73 and created using the scaled FPKM gene values. Red indicates high expression, and blue indicates low expression. E, Names and annotations of the 19 proteins/genes in Cluster 1, labeled as rows in (A) and (C).
To further investigate expression levels of the Cluster 1 genes, we used a heatmap to visualize the gene expression dataset from Stelpflug et al. (2016) at various developmental stages of B73, including whole seed, endosperm, and embryo up to 24 DAP, as well as ear, tassel, internode, and leaf (Supplemental Figure S3). We found gene expression was higher overall in seeds and reproductive tissues as compared to vegetative tissues. We also found high gene expression in the embryo (in most of the stages), endosperm (particularly in the early stages from 12 to 14 DAP), whole seed, ear, and tassel. Gene expression was relatively low in the different leaf stages; however, expression in the internode was higher relative to the leaf (Supplemental Figure S3). The latter pattern indicates that these ribosomal proteins are not housekeeping genes and are extensively differentially expressed across the various tissues, most especially in seeds.
eQTL-mQTL analysis indicates that only a few HCCG are driven by expression
We evaluated whether our identified mQTLs (the 80 HCCG) are driven, at least in part, by significantly different expression levels in the seeds. We used 3′ RNA-sequencing data from maize developing seeds (350 growing degree DAP) collected from 255 inbred lines of the Goodman–Buckler association panel, as previously published in Kremling et al. (2018). Out of the 80 HCCG, 18 (22.5%) had significant associations between their expression levels and their corresponding GWAS tag SNP locus and, therefore, are potential expression QTL (eQTL) candidates (Supplemental Table S6A). However, we were interested in whether these eQTLs explained the variation in the corresponding PBAA-related traits. We did a Pearson correlation test between the normalized gene expression levels of those 18 genes and their corresponding PBAA traits from the GWAS (Supplemental Table S6B). Out of these 18, only four genes showed a significant correlation at a 5% level of significance with 11 PBAA traits (Supplemental Table S6B). Interestingly, one gene, Glutelin-2 (27kD γ-zein; GRMZM2G138727), associated with eight PBAA traits (i.e. H/M, H/Z, Z/ZHGPR, H/Total, L/IVL, V/A, V/LAV, and V/Total). Notably, most correlations, although significant, were low, which we infer is a consequence of using data coming from two independent experiments or from additional biological factors at play. However, since only four eQTL/mQTL were uncovered, this analysis suggests that most of the mQTLs we identified are not driven by expression variation.
Ranking the 80 HCCG by comprehensively integrating all performed analyses
Finally, we prioritized and ranked the 80 HCCG by integrating data from all analyses performed. We interrogated each of the 80 genes using the following five criteria: (1) Does it have a significant association (based on the GWAS analysis) with multiple traits? (2) Does it have a high WGCNA-kME, which represents the connectivity of a given protein with a module eigengene (kME > 0.7)? (3) Does it have an mQTL that is driven by gene expression (mQTL/eQTL)? (4) Does it have a STRING analysis connectivity >0.4, which represents the inclusion in the PPI network; and (5) Does it have a STRING analysis connectivity >0.7, which represents high connectivity within the PPI network. Each gene was given a score, from 1 to 5, that reflected how many criteria is fulfilled. We then ranked the 80 genes according to their score (high to low). Genes with the same score were further ranked by the number of traits associated with it in the GWAS (the more traits, the higher the rank; Supplemental Data Set S8). We used the following logic as the basis for the five criteria: genes that are associated with multiple traits in the GWAS and/or are highly connected in either the coexpression (WGCNA) analysis or the functional analysis (STRING) and/or showing eQTL/mQTL are key genes that are involved in the PBAA composition in seeds.
The top 15 genes are summarized in Table 2. GRMZM2G138727 (27kD γ-zein) met all five criteria and was the top-ranked gene, followed by GRMZM2G058760 (ferredoxin NADP reductase1-fnr1) and GRMZM2G138689 (50 kD γ-zein). Then 27 kD γ-zein and 50 kD γ-zein are the two important genes that were previously reported to be involved in PBAA composition in seeds (Guo et al., 2013; Deng et al., 2017), thus confirming the effectiveness of our ranking approach. Four ribosomal genes were also ranked among the top 15 HCCG, as was elongation initiation factor 3 (eIF3) and 26S protease regulatory subunits, indicating that protein synthesis and degradation related machinery, especially the translational machinery, must play a key role in shaping and regulating PBAAs in maize kernels. Three genes, which included eIF3, Zein 2 (16 kD zein), and a 40S ribosomal subunit, were responsible for Met-related traits, and a different ribosomal protein was associated with Lys-related traits. Both Lys and Met are amino acids that are deficient in maize. Many of the top-ranked genes were associated with Arg, Met, His, and Lys (minor amino acids) or BCAA (major amino acids) related traits.
Table 2.
Ranking of the top 15 HCCG list
| GWAS |
WGCNA |
eQTL-mQTL |
STRING |
||||||
|---|---|---|---|---|---|---|---|---|---|
| Rank | Protein ID (HCCP) | Annotations | Significant Traits | Multiple Associationsa | HC Geneb | Expression Drivenc | 0.4d | 0.7e | Combined Scoref |
| 1 | GRMZM2G138727_P01 | Glutelin-2 Precursor (zp27; Zein-gamma-27 kDa zein) | H/M, H/Z, Z/ZHPR, H/TOTAL, L/IVL, V/A, V/LAV, V/TOTAL | 1 | 1 | 1 | 1 | 0.5 | 4.5 |
| 2 | GRMZM2G058760_P01 | Ferredoxin NADP reductase1-fnr1 | H/Z, H/ZHPR, H/TOTAL | 1 | 1 | 1 | 1 | 0 | 4 |
| 3 | GRMZM2G138689_P01 | 50 kD gamma zein- gz50 | Z/ZHPR, H/M, H/Z, H/TOTAL, L/IVL, V/A, V/LAV, V/TOTAL | 1 | 1 | 0 | 1 | 0.5 | 3.5 |
| 4 | GRMZM2G044800_P01 | 40S ribosomal protein S11 | K/IMTXK, V/LAV | 1 | 1 | 0 | 1 | 0.5 | 3.5 |
| 5 | GRMZM2G440208_P01 | 6-Phosphogluconate dehydrogenase | L/IVL, M/I | 1 | 1 | 0 | 1 | 0 | 3 |
| 6 | GRMZM2G090338_P01 | sulfite reductase1- sir1 | L/V, V/LAV | 1 | 0 | 1 | 1 | 0 | 3 |
| 7 | GRMZM2G176396_P02 | Pro iminopeptidase | L/LAV, V/IVL | 1 | 0 | 0 | 1 | 0.5 | 2.5 |
| 8 | GRMZM2G024354_P01 | Ribosomal protein l15 | T/K, R/ZHPR | 1 | 0 | 0 | 1 | 0.5 | 2.5 |
| 9 | GRMZM2G125300_P02 | 40S ribosomal subunit protein S21 | M/TOTAL | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
| 10 | GRMZM2G360681_P01 | Heat-shock protein 101 | V/A | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
| 11 | GRMZM2G110185_P02 | 26s protease regulatory subunit 7 | R/ZHPR | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
| 12 | GRMZM2G060429_P01 | Zein-β Precursor (Zein-2)(16 kDa zein) (Zein Zc1) | M | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
| 13 | GRMZM2G102779_P01 | eIF3 | M | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
| 14 | GRMZM2G587327_P01 | Hypothetical protein LOC100382060 | V/A | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
| 15 | GRMZM2G140116_P01 | 60 Ribosomal protein l14 | H/ZHPR | 0 | 1 | 0 | 1 | 0.5 | 2.5 |
The data integration from GWAS, WGCNA, expression variation correlation with trait correlation (eQTL/mQTL), and STRING analyses were used for determination and a combined score and ranking were based on the cumulative number of criteria that each gene fulfilled. “1” means that the condition is satisfied, while “0” means that the condition is not satisfied. The criteria are asignificant association based on GWAS analysis with multiple traits; bhigh WGCNA-kME, which represents the connectivity of a given protein with module eigengene (kME > 0.7); cmQTLs that is driven by mQTL/eQTL; dSTRING analysis connectivity >0.4, which represents the inclusion in the PPI network; and eSTRING analysis connectivity >0.7, which represents high connectivity within the PPI network—since STRING analysis at confidence score >0.7 and >0.4 is not independent, partial score (0.5) is given instead of full score (1) if the gene is included in the PPI network. fCombined score of all the five criteria. Significant Traits under GWAS represent traits that were significantly associated with that SNP in our GWAS.
Haplotype and pairwise LD analysis of the genomic regions associated with the top 15 HCCGs
We further characterized the polymorphisms in the genomic regions associated with the top 15 HCCGs. We used Haploview version 4.2 (Barrett et al., 2004) to perform a haploblock/haplotype analysis on the 200 kb region (100 kb down/upstream) spanning each significant SNP associated with the 15 genes. This haploblock analysis was performed for each GWAS peak SNP using the relevant genotypic dataset (see “Materials and methods”). Next, we identified the haploblocks and the corresponding haplotypes that spanned both the identified GWAS SNP and the SNPs that spanned at least part of the relevant candidate gene. Using ANOVA and a post-hoc Duncan comparison analysis, we tested whether the relevant PBAA traits (the ones identified in the GWAS) segregated among the different haplotypes. Our goals were to characterize the polymorphisms in the region of the peak SNP and to test whether the haplotype analysis would support the identification of the top 15 HCCGs. We also performed a pairwise LD analysis using squared allele-frequency correlations (r2) between the GWAS SNP and the SNPs located within the candidate genes to understand their relationships to one another. Supplemental Data Set S9 includes the identified haploblock positions, their corresponding haplotypes, the haplotypes’ proportion within the population, the ANOVA P-value for the relevant PBAA-related traits, and the maximum pairwise LD r2 estimates.
Sixteen unique SNPs were associated with the top 15 HCCGs. Of these, 6 SNPs did not reside within any haploblock: Rank4 gene SNP ss196517349; Rank5 gene SNP S4_18573783; Rank 6 gene SNP S6_157019045; Rank 8 gene SNP S7_8465599; Rank 13 gene SNP S6_164900150; and Rank 15 gene SNP ss196440772 (Supplemental Data Set S9, Columns b, i, and m). ANOVA and Duncan post hoc comparison analysis revealed that in 24 (out of 30) haploblock analyses, the relevant PBAA-related traits segregated significantly (α < 0.05) among the haplotypes that spanned the identified GWAS SNPs (Supplemental Data Set S9, Columns g, i, and p). These results align well with our GWAS.
Among the top 15 HCCGs, the following 6 genes were not included in any haploblock: rank1 gene-GRMZM2G138727; rank2-GRMZM2G058760; rank3-GRMZM2G138689; rank4-GRMZM2G044800; rank6-GRMZM2G090338; and rank15-GRMZM2G140116 (Supplemental Data Set S9, Column q; see also “Materials and methods”). This was due either to no SNP call within the gene open reading frame or to very low LD. Therefore, we identified haploblocks and analyzed haplotypes for only 9 of the 15 HCCGs (Supplemental Data Set S9, Columns a, b, i, and q). Notably, some genes had more than one haploblock spanning at least part of them. An ANOVA of the 11 haplotype-trait associations found that 5 PBAA-related traits associated with 4 HCCGs (rank7 gene- L/LAV; rank8-R/ZHPR; rank8-T/K; rank9-M/Total; rank13-M) segregated at α = 0.1 along at least one haplotype within the HCCG reading frame (Supplemental Data Set S9, Column u). Interestingly, three genes (genes rank 8, 9, and 13) are part of the translational machinery: Ribosomal protein l15, 40S ribosomal subunit, and eIF3 (Supplemental Data Set S9, Columns a and c). For example, M/total segregates across haplotypes spanning the 40S ribosomal subunit S21 (P-value = 0.07; Rank Gene 9; Supplemental Data Set S9, Column u), while M absolute level traits segregate across haplotypes spanning/including eIF3 (P-value 0.000979; Rank Gene 13; Supplemental Data Set S9, Column u). Analyses of two Met-related traits are elaborated in Supplemental Figures S4 and S5.
We performed a pairwise LD analysis of the 16 unique GWAS SNPs (associated with the top 15 HCCGs) with the SNPs residing within 9 of the 15 HCCGs (Supplemental Data Set S9, Column v). SNPs within 8 HCCGs had low to perfect LD with their corresponding GWAS SNP; the LD r2 estimate ranged from 0.13 to 1, and one SNP was not in LD at all with r2 = 0.03 (Supplemental Data Set S9, Column v). The highest r2 LD estimate (r2 = 1) was between S4_18573783 and an SNP within 6-phosphogluconate dehydrogenase, decarboxylating (EC 1.1.1.44) (Rank gene 5; Supplemental Data Set S9, Column v). The second-highest estimate was between S1_285140826 and an SNP within the ferredoxin NADP reductase1-fnr1gene (r2 = 0.59) (Rank gene 2; Supplemental Data Set S9, Column v). We suggest the low pairwise LD may be due to the low SNP coverage in these genomic regions. Altogether, the haplotype and pairwise LD analyses lend additional support to 6 of the top 15 HCCGs (gene rank 5, 2, 7, 8, 9, and 13), three of which (gene rank 8, 9, and 13) are part of the translational machinery.
Comparative analysis of previously identified FAA QTLs with the 80 PBAA HCCG list
FAAs are the precursors of PBAAs. Hence, we tested whether any of the 80 HCCGs and more particularly the top 15 HCCGs had been previously identified in studies of FAAs. We mined the genomic positions of FAA QTLs identified in three key studies that focused specifically on seed FAA genetic architecture (Wang and Larkins, 2001; Wu et al., 2002; Pineda-Hidalgo et al., 2011). All three studies performed QTL analysis on the population derived from a cross between Oh545o2 (high FAA) and Oh51Ao2 (low FAA); the FAA content in Oh545o2 is 3 times greater than that in Oh51Ao2 (Wang and Larkins, 2001). We also mined QTLs for three major EAAs (Lys, Met, and Trp) in maize kernels from a QTL analysis (Gutiérrez-Rojas et al., 2010) of a population of recombinant inbred lines derived from a cross between an o2 line (B73o2) and a quality protein maize line (CML161). The latter line is a modified o2 genotype with a hard endosperm that retains higher levels of Lys and Trp than normal maize lines (Vasal, 2000; Gutierrez-Rojas et al., 2010). We note that the o2 background mutation used in all these FAA studies has a major impact on the α zeins SSPs and may bias the overall comparison with our results. Nevertheless, these populations are ideal for investigations of seed FAAs since this o2 mutation has a different impact on FAA levels when present in different backgrounds (Wang and Larkins, 2001; Gutiérrez-Rojas et al., 2010; Pineda-Hidalgo et al., 2011). These studies also were chosen since multiple QTLs were identified in at least two studies (Supplemental Data Set S10), lending support to their role in FAA regulation. The flanking marker, bin number, and genomic position for each significant FAA QTL are in (Supplemental Data Set S10).
From this comparative analysis, we identified 32 nonredundant genes that overlapped both with our 80 PBAA HCCGs and the genomic intervals of the previously identified FAA QTLs (Supplemental Data Set S10). This suggests that the genetic architecture for FAA levels may overlap partly with some PBAA-related traits. This potential overlap needs further study and confirmation since the QTL intervals are large and the resolution low. Still, it is intriguing that 9 (ranked genes 1, 3, 4, 6, 7, 10, 13, 14, and 15) of our top 15 HCCGs, four of which are translational machinery genes (3 ribosomal genes and one eIF3), were located within the FAA genomic intervals (Supplemental Data Set S10).
Discussion
The development of effective seed amino acid biofortification strategies requires a fundamental understanding of the mechanism that underlies PBAA composition. We have learned from multiple studies that have targeted SSPs in various plant species that the genetic and metabolic bases of seed composition are interconnected and complex, and do not rely merely on the expression of specific proteins (Larkins and Hurkman, 1978; Messing, 1983; Morton et al., 2015). In this study, we used and integrated two omics approaches to analyze the genetic basis of PBAA natural variation and its association with the compositional dynamics during seed development and maturation. Our results provide strong evidence that the structural components of the translational machinery play a key role in seed PBAA regulation.
Functional categorization of candidate genes resulting from GWAS of PBAA-related traits highlights proteins and RNA metabolism
The phenomenon of proteomic rebalancing makes altering seed PBAA composition via mutation difficult (Wu and Messing, 2014; Morton et al., 2015). Nevertheless, we know that seed PBAA composition is a complex trait that varies across natural and artificial populations of maize and other crops (Deng et al., 2017; La et al., 2019). Indeed, our study found that both PBAA absolute levels (nmol/mg) and PBAA relative composition (PBAA/TPBAA) show significant natural variation (Figure 1, A and B). Interestingly, a correlation analysis of both these traits across a large association panel showed a strong and significant positive correlation between PBAA absolute levels, but significant correlations, both positive and negative, for only a handful of PBAA relative composition levels (Figure 1, C and D). We interpret this result to mean that the high coordination observed in the overall amino acid absolute levels results from a general change in the overall protein levels in the different genotypes, as opposed to the relative composition, which itself more likely reflects the natural variation of the proteomic composition in the seeds. For example, the relative compositions of Lys and Leu demonstrate a strong negative correlation, which very likely reflects variation in the abundance of SSPs, which are high in Leu but poor in Lys (Gorissen et al., 2018). Our analysis also found a high positive correlation between the natural variation of Lys and Asx, which aligns with a previous study where proteomic perturbations that led to high levels of Lys also led to high levels of Asx (Hunter et al., 2002).
These findings support the notion that different PBAA-related ratios represent different aspects of the relationship between seed PBAAs. Therefore, we included in our GWAS all relevant metabolic ratios of PBAA based on their biochemical affiliations and interpreted the resultant candidate genes to be part of the comprehensive genetic architecture of PBAAs. A similar approach has proven to be very effective in other metabolic studies in maize and Arabidopsis (Angelovici et al., 2013; Deng et al., 2017; Slaten et al., 2020a, 2020b). Our GWAS and candidate gene extraction approach yielded a relatively large number of unique candidate genes for all PBAA-related traits, most likely due to the relatively permissive statistical correction used for our analysis. However, since we intended to intersect the GWAS candidate gene list with an orthogonal candidate gene list, the length of the list did not pose a concern. Nevertheless, we conducted a functional analysis of these candidate genes and found that both “proteins” and “RNA” were the largest identified categories (Figure 2B). This finding suggests that both protein metabolism and gene expression may lie at the heart of seed PBAA.
Notably, our model did not include a taxa by year (Genotype × Environment interaction [G×E]) effect or seed weight as a covariate and thus may have missed candidate genes (see “Materials and methods” for rationale). Integrating these interactions in future studies will help to unravel more fully the breadth of the genetic basis of PBAA-related traits in seeds and, crucially, how PBAAs are affected by variations in environment and during seed development. Different ratios between endosperm and embryo, for example, may affect PBAA composition, but collecting such data for large panels may prove as a challenging enterprise.
An increase in the relative composition of several PBAAs during seed development and maturation is negatively correlated with multiple translational machinery components
We intersected our GWAS with proteomic expression data. This combination is rarely implemented because of the cost and difficulty associated with proteomic analyses. Similar multiomics studies have used transcriptomic expression data. However, we reasoned that the association between overall seed PBAA composition dynamics during development and maturation and proteome expression patterns would yield more direct and biologically relevant findings. From our analysis, we learned that the relative composition levels of Leu, Ile, Val, Tyr, and Pro were elevated toward seed desiccation (Figure 3) and were positively associated with three proteomic coexpression modules (brown, green, and black) that also gradually increased during kernel development and maturation (Figure 4, A and B). In contrast, Ala, Asx, Glx, Gly, His, and Lys relative compositions decreased during kernel development and maturation and were strongly associated with two protein coexpression modules (blue and turquoise) that also gradually decreased (Figure 4, A and B). These PBAA composition dynamics are consistent with the elevation of SSPs that are poor in Lys and rich in Leu and Pro (Hunter et al., 2002; Gorissen et al., 2018). Our analysis further revealed that 45% of the detected proteins were reduced during kernel development and maturation, while only 14% increased (Figure 4, A and B). This large reduction in proteins undoubtedly contributed to the PBAA composition in the dry seed. The effect of this reduction on PBAA composition might be on par with the elevation of SSPs; however, since our proteomic analysis was semi-quantitative, a future study is needed to confirm this claim. Nonetheless, this finding should be of great interest to seed amino acid biofortification efforts since it suggests that the proteins to target may be the reduced ones, and not the abundant SSPs.
The large reduction of proteins during kernel development and maturation coincided with two major processes. The first is the accumulation of SSPs, which starts around 10 DAP and increases all the way through seed maturity (Sabelli and Larkins, 2009; Larkins, 2017). The second is a programmed cell death (PCD) that is initiated around 12–16 DAP and expands to engulf the entire starchy endosperm by late development (Young et al., 1997), and which leads to the degradation of many unnecessary proteins in the endosperm. It is interesting then that the proteins included in the two protein expression modules (turquoise and blue module) that showed reductions toward desiccation were highly significantly enriched in biological processes related to translation and gene expression (Supplemental Table S4). It stands to reason that many gene expression-related proteins are downregulated during development and maturation as a result of PCD and in preparation for desiccation and dormancy. The large reduction in proteins related to translation is, nonetheless, surprising since this period is characterized by massive SSP synthesis and accumulation (Prioul et al., 2008; Sabelli and Larkins, 2009; Larkins, 2017). Even more striking is that a large portion of these proteins is ribosomal proteins which are associated with protein synthesis (Supplemental Data Set S7; turquoise module). These results could indicate that specific types of ribosomes are associated with the translation of SSPs and that this reduction may be part of a global translational reprogramming or some sort of translational switch. Global translational control has been reported for stresses that produce energy deficits, including hypoxia (Branco-Price et al., 2005, 2008), heat (Yangueez et al., 2013), and drought (Kawaguchi and Bailey-Serres, 2002; Kawaguchi et al., 2004). Selective translational regulation also has been associated with dark/light transition (Bailey-Serres and Juntawong, 2012), photomorphogenesis (Liu et al., 2013), daily clock cycle (Missra et al., 2015), and symbiosis with nitrogen-fixing bacteria (Reynoso et al., 2013). Consistently, a study by (Shamimuzzaman and Vodkin, 2014) that found an increase of 64 ribosomal proteins during the transition of soybean cotyledons from storage organs to photosynthetic organs (4 d after seed imbibition and germination), supporting an involvement of specific ribosomal changes in developmental transitions (Shamimuzzaman and Vodkin, 2014). In sum, we infer from the results of our association analysis of proteomic expression and seed PBAA that translational reprogramming likely occurs during seed development and maturation which plays a key role in shaping the PBAA composition.
An analysis of HCCG relationships reveals a tightly connected cluster dominated by translational machinery components
We used two orthogonal approaches to pinpoint the genetic basis of PBAA composition in seed. We reasoned that (1) genes that are associated in both analyses are key regulatory genes determining PBAA composition in seeds; (2) population structure-related issues typical in GWAS do not pose a problem since we used only one genotype for the developmental analysis; and (3) the overlap in genes between the two approaches is not random. Our multiomics approach yielded 80 HCCGs, in which our statistical analyses support our nonrandom assumption.
A PPI analysis of the 80 HCCG found several biological processes previously implicated in seed PBAA composition, as well as biological processes not previously reported. The previously characterized processes were storage proteins, transport and amino acids, lipid, and energy and carbohydrate metabolism. The unreported processes were related to translational machinery, especially ribosomal proteins (Figure 6A). Lipid, carbohydrate, and energy metabolism affect absolute levels of PBAA in crop seeds (Miclaus et al., 2011; Jia et al., 2013; Deng et al., 2017), while cell transport, select amino acid metabolic pathways, and storage proteins affect PBAA relative composition (Wang and Larkins, 2001; Hunter et al., 2002; Pandurangan et al., 2012; Morton et al., 2015). Interestingly, our GWAS identified only three γ-zeins (Supplemental Table S2), notably the same three zeins also identified as part of our top-ranked HCCG list (Table 2). Among these three zeins, Glutelin-2 ranked highest. This ranking was consistent with its previous detection in a GWAS of PBAAs measured from a different maize association panel (Deng et al., 2017), highlighting its general importance in the natural variation of seed PBAA. This protein is also involved in the initiation of SSP protein bodies (Guo et al., 2013). Together, these results strongly support the validity of our multiomics approach as well as the integrative scoring method we used to rank our HCCG list.
Surprisingly, the functional analysis of the 80 HCCG highlighted the role that core protein metabolism, especially ribosomal proteins, plays in seed PBAA composition. The PPI analysis showed that the 80 HCCG included only one large tightly connected group dominated by genes related to protein metabolism, where more than half were components of the translational machinery, with nine ribosomal proteins and one translational initiation factor eIF3. The remaining proteins included four protein chaperons, two involved in protein degradation, and one vesicle-fusing ATPase (Supplemental Table S5). Four of the ribosomal genes and eIF3 are in our top 15 HCCG list (Table 2). An analysis of publicly available transcriptional data of the genes in Cluster 1 indicated that these genes are differentially expressed and have relatively high expression levels in reproductive tissues, mainly endosperm and embryo (Supplemental Figure S3). Hence, we reason that a key factor shaping seed PBAA is specific translational attenuation, most probably driven by alteration to specific components of the translational machinery, namely the ribosomal proteins. Studies over the past two decades have shed light on the adaptive nature of the ribosomal proteins and their potential selective regulation of translation. Multiple ribosomal protein mutants, for example, have revealed a functional role for these proteins in developmental and stress-related processes (Ma and Dooner, 2004; Tzafrir et al., 2004; Yan et al., 2016; Zhang et al., 2016). Many ribosomal proteins are transcriptionally regulated during stress, such as in sucrose feeding (Gamm et al., 2014), cold, heat, and UV-B (Sormani et al., 2011; Sáez-Vásquez and Delseny, 2019) as reviewed in Martinez-Seidel et al. (2020). We also know from ribosomal profiling of mouse embryonic stem cells that ribosomal heterogeneity at the level of core ribosomal proteins facilitates preferential translation of specific mRNAs (Shi et al., 2017). Most recently, it was proposed that plant evolution directing high ribosomal proteins paralog divergence toward functional heterogeneity (Martinez-Seidel et al., 2020).
Previous studies have established that proteomic reprogramming and rebalancing are due, in large part, to translational regulation rather than transcriptional regulation (Schmidt et al., 2011; Morton et al., 2015). For example, proteomic characterization from endosperm of the RNA-Directed DNA Methylation 4 mutant which exhibits proteomic reprogramming and rebalancing, uncovered enrichment in protein biosynthesis, especially ribosomal biogenesis (Jia et al., 2020). These previous observations align with a key role of ribosomal proteins in seed proteome and seed PBAA composition. An additional finding that supports this hypothesis is the identification of an eIF3 component as an HCCG. eIF3, which consists of 12 subunits, is the most complex and largest initiation factor, participating in nearly all major steps of translation initiation (Browning and Bailey-Serres, 2015; Merchante et al., 2017). Our haploblock and haplotype analysis also supports a role for eIF3 in the regulation of PBAAs (Supplemental Figure S5; Supplemental Data Set S9). Alterations to eIFs can have drastic effects on translation, either by global repression or upregulation (Merchante et al., 2017). Mutants of several subunits of the eIF3 complex in Arabidopsis have been shown to affect the translation of genes in specific processes (Merchante et al., 2017). Notably, eIF protein levels, especially eIF2, are enhanced in several maize opaque mutants that undergo rebalancing but still display elevated levels of Lys (Habben et al., 1993, 1995; Jia et al., 2013).
Interestingly, 32 genes from our 80 HCCG list overlapped with QTLs identified in previous studies focused on FAAs in maize kernels (Supplemental Data Set S10). Nine out of the top 15 HCCGs are included in this overlap list, and three of these genes are ribosomal genes and one is eIF3. These findings suggest that the genetic architectures for FAA and PBAA may partially overlap. While these findings are very intriguing, they should be interpreted with “a grain of salt” as they are based on data drawn from studies at different genomic scales and mapping populations with an o2 mutation in their background, which affects seed SSPs. We expect that future validation of the HCCGs will clarify and provide context for this particular finding.
Conclusion
This study provides further support for the effectiveness of using an integrative multiomics approach to shed light on the biological processes involved in regulation of complex metabolic traits in plants, here seed PBAA composition in maize. With this approach and by using additional publicly available datasets, we uncovered two uncharacterized factors associated with seed PBAA: a translational reprogramming during seed development and maturation and the complex dynamic and heterogeneity of the translational machinery, especially ribosomal proteins. We propose that the role of ribosomal proteins and the translational dynamic in seed PBAA composition represent exciting avenues for seed amino acid biofortification that might be the key to overcoming proteomic reprogramming and rebalancing.
Materials and methods
GWAS field trial
We collected phenotypic data from 279 out of 282 lines of the Goodman–Buckler maize association panel (Flint-Garcia et al., 2005) as a few lines did not grow out. These lines were grown at Genetics Farm near Columbia, MO over the summers of 2017 and 2018 with two replications each, using a randomized complete block design. Briefly, 13 kernels per inbred line from each replication in both 2017 and 2018 were grown in 10 feet rows. Self-pollination was done for every standing plant in a row, each row consists of 13 plants and measures were taken to avoid cross contamination. Pollinated ears from every single row were harvested at full maturity, and cob husks were removed. Ears from each inbred line row were dried at 105 degrees Fahrenheit and <20 relative humidity for 5 d and then shelled and bulked to form representative composite grain samples. This resulted in four composite grain samples (two replications and 2 years) for each 279 inbred lines. Twenty-five random seeds from each bulked sample were ground into fine powder for PBAA quantification.
PBAA quantification
Seed PBAAs were extracted and detected using a high-throughput absolute quantification protocol. The protocol combines a microscale protein hydrolysis step and an absolute quantification step using multiple reaction monitoring-based liquid chromatography–tandem mass spectrometry detection, as described in Yobi and Angelovici (2018). Due to acid hydrolysis, Gln and Asn were converted to Glu and Asp, respectively; therefore, Glx denotes Gln and Glu, and Asx denotes Asn and Asp. In addition, Trp and Cys were destroyed, and Gly was poorly detected (especially in the dry kernels). In total, 15 PBAAs were analyzed which represent 17 PBAAs.
Phenotypic data analysis and GWAS
We evaluated a total of 76 PBAA absolute, relative composition, and biochemical traits. All traits were treated independently. Metabolic ratio traits were derived prior to calculation of the best linear unbiased predictions (BLUP) to minimize noise. For each trait, outlier removal, optimal transformation, and BLUP calculation were performed as previously described (Slaten et al., 2020a, 2020b). Variance components were estimated from a mixed linear model (MLM), where taxa, replication, and year are fitted as random effects, and were used to estimate broad sense heritability on a line mean basis as previously described (Holland et al., 2003).
We tested the prior model fitting for generating BLUPs using additive model;
| (1) |
as well as taxa × year interaction model;
| (2) |
In both of these models, Yijk represents an individual amino acid phenotypic observation; μ represents the grand mean; taxai is the effect of taxa i; blockj is the effect of block j; yeark is the effect of the year k; taxa × yearik is the effect of the interaction between taxa i and year k; block(year)kj is the effect of block j within year k; taxa(block × year)kji is the effect of taxa i within block i within year k; and εijk is the residual or random error term for individual phenotypic observation n. The residuals were assumed to be independent, normal random variables with mean zero and variance σε2. All the terms were fitted as random effect except the grand mean.
We used the same location to grow our diversity panel across the 2 years. To assess the importance of the taxa × year interaction term in the interaction fitted model, we calculated the R2 contribution of the taxa × year term in Model (2) for the 76 traits and found that the contribution of the taxa × year variance to the total phenotypic variance was very low (Supplemental Data Set S11). In addition, the interaction term in Model (2) was statistically significant at but only 11 out of 76 PBAA related traits has a significant taxa × year interaction (Supplemental Data Set S11). Hence, we chose the simple additive model (i.e. Model (1)) over the complex interactive model (Model (2)). Nevertheless, it is worth noting that adding more environments and including G×E interaction in our model could have resulted in a more elaborate genetic architecture of the PBAA. It is also important to note that including other covariates such as seed weight in our model could help understand the potential impact/relationship of seed weight on the AA genetic architecture. These variations could potentially arise from differences in the sizes of the embryo to endosperm at the end of the desiccation, as these two seed components are known to have different PBAA makeup (Rabideau, 1954; Gibbon and Larkins, 2005; Zheng et al., 2019). We removed Ser/Total from the downstream GWAS analysis because when this model was set as the response variable for Model (1), the model fitting algorithm for (1) did not converge.
The association panel was previously genotyped with the Illumina MaizeSNP50 BeadChip (Cook et al., 2012) and with genotyping-by-sequencing (Elshire et al., 2011) as described in Lipka et al. (2013). SNPs were filtered using a minor allele frequency >0.05, and a total of 458,775 SNPs from both genotype dataset was used for the GWAS analysis. We used GAPIT (Lipka et al., 2012) MLM and FarmCPU (Liu et al., 2016) to conduct the univariate GWAS. FDR (Benjamini and Hochberg, 1995) was used to correct the multiple hypothesis testing problem at 5%. Candidate gene lists were obtained using a 200 kb window size (100 kb on either side) of each significant SNP. The physical locations and annotations of genes were based on Maize AGP_V2 (http://ftp.maizesequence.org/release-5b/filtered-set/).
Gene functional categorization using MapMan
MapMan version 3.6 (Lohse et al., 2014) was used for the functional categorization of the candidate genes generated by GWAS. The genes were mapped to corresponding bins using the Z. mays mapping database Zm_B73_5b_FGS_cds_2012.m02 obtained from https://mapman.gabipd.org/mapmanstore. A total of 35 functional gene categories were in the mapping database.
PBAA and proteome quantification of a B73 kernel developmental/maturation time course
The B73 inbred line was grown during the summer of 2018 at Genetics Farm near Columbia, MO following the same protocols as above. Samples from 10 seed filling stages were collected, starting at 10 DAP and then every 4 d until 46 DAP: 10, 14, 18, 22, 26, 30, 34, 38, 42, and 46 DAP. Three biological replicates were collected from each developmental stage. The whole ear from each biological replicate and each time points (3 × 10 = 30 ears) was harvested and frozen with liquid nitrogen, and 15 randomly chosen developing seeds were sampled and stored immediately in liquid nitrogen. The seeds were lyophilized for 5 d and then finely ground for PBAA quantification as described previously. The three data points from each stage were averaged. Those data were scaled using scale function in R version 3.4.3, and were used to create heatmap using hierarchical clustering and using the pheatmap package (Kolde and Kolde, 2015) in R version 3.4.3. The average PBAA expression pattern across the seed development was created using geom line “mean” function in the ggplot2 package (Wickham et al., 2016) in R version 3.4.3.
Proteomic analysis
Protein extraction was performed based on the (Hurkman and Tanaka, 1986) method as described in (Yobi et al., 2020). Briefly, 5 mg finely ground seed powder was weighed and extracted with Tris–HCl buffered phenol and an SDS extraction buffer. After trypsin digest and purification, the peptides were analyzed on a Bruker timsTOF PRO using the PASEF (1) method. The acquired data were submitted to the PEAKS DB search engine (version 8.5; Bioinformatics Solutions Inc., Waterloo, Canada) for peak picking. Protein identification was completed using the MaizeGDB database (Lawrence et al., 2004). Proteins with spectral counts ≥4 were retained for analysis.
WGCNA analysis
We performed a weighted protein correlation network analysis (Langfelder and Horvath, 2007) using the R package WGCNA (Langfelder and Horvath, 2008). We chose the soft threshold power β = 12 to construct the coexpression network as the R2 reached around 80% ensuring the network was close to the scale-free network. We used Pearson correlation, blockwiseModules function, and the dynamic tree cut algorithm (Langfelder et al., 2007) with a height of 0.20 and a minimum module size of 30. Modules are defined as the branches of the dendrogram.
GO enrichment analysis
GO enrichment analysis was performed using AgriGO_V2 (Tian et al., 2017). The following parameters were used to determine the GO biological process, cellular component, and molecular function terms that were overrepresented (P < 0.05): a hypergeometric test with a 5% FDR correction, a custom reference that consisted of 2,648 proteins detected in our proteomics study, Z. mays as the select organism, and GO full plant ontologies.
eQTL analysis of the 80 HCCG
We used the gene expression dataset from Kremling et al. (2018). The authors collected 3′ RNA-seq data from seven different tissues, including developing kernels at 350 growing degree days (equivalent to ∼20–30 DAP in time course experiments) from 255 inbred lines of the same Goodman–Buckler association panel. We limited our eQTL analysis by using expression datasets specifically from the seed and to 80 HCCG. We first choose 80 HCCG gene expression levels from the seed-specific dataset and associated them with the corresponding GWAS tag SNPs by using the Student t test followed by an FDR correction at a 0.05 significance level. However, we defined an eQTL as significant only if it explained the variation of the corresponding PBAA related trait. Hence, we performed a Pearson correlation test between the normalized gene expression level and the corresponding PBAA trait from the GWAS.
Haploblock/haplotype analysis and pairwise LD calculation
We characterized the polymorphisms in the genomic regions associated with the top 15 HCCGs by performing a haploblock/haplotype analysis on the 200 kb region (100 kb down/upstream) spanning the respective GWAS peak SNP using Haploview version 4.2 (Barrett et al., 2004). This analysis was done on the genotype datasets generated with the Illumina MaizeSNP50 BeadChip (Cook et al., 2012) as well as with the genotyping-by-sequencing (Elshire et al., 2011) as described in (Lipka et al., 2013) for performing GWAS. Pairwise LD values were calculated using r2. All SNPs were filtered at a 5% minor allele frequency. Default Gabriel block parameters were used, resulting in blocks that contained at least 95% of the SNPs in strong LD. Haploview was used to identify the haploblocks and the corresponding haplotypes that spanned the GWAS peak SNPS and the haploblocks that spanned or included part of a candidate gene. We tested the relevant PBAA traits segregating among the different haplotypes using ANOVA and a post-hoc Duncan comparison analysis.
PPI of the 80 HCCG using STRING
We constructed and visualized the PPI network associated with the 80 HCCG using the Search Tool for the Retrieval of Interacting Genes/Proteins database STRING version 11.0 (Szklarczyk et al., 2019). Active interaction sources, including high-throughput lab experiments, gene coexpression, and previous knowledge from curated databases specific to Z. mays, were used to construct the PPI network at medium confidence (>0.4) and high confidence (>0.7) levels (Szklarczyk et al., 2019). We used the MCL clustering algorithm within STRING (Szklarczyk et al., 2019) to further investigate strong interactions among nodes in the PPI network.
Heatmap and average expression trend of 19 genes found in the Cluster 1 of PPI using two publicly available gene expression data
We used two publicly available gene expression data to visualize the expression pattern of 19 genes found in Cluster 1 of PPI analysis. First, we used the gene expression data from Chen et al. (2014) where we extracted dataset comprised of developing B73 kernels from 0 to 38 DAP specifically from whole kernels (Chen et al., 2014). Second, we used the gene expression data from Stelpflug et al. (2016) where we extracted dataset from various developmental stages of B73, including whole seed, endosperm, embryo, ear, tassel, internode, and leaf. Heatmap was created using the pheatmap package (Kolde and Kolde, 2015) in R version 3.4.3 and the average expression pattern of the 19 genes was created using geom line “mean” function in the ggplot2 package (Wickham et al., 2016) in R version 3.4.3. Both the heatmap and the average expression trend were created using the standardized scaled gene expression data.
Comparative analysis of the FAA QTLs with the 80 PBAA HCCG list
We mined the genomic positions of FAA QTLs from three studies that investigated the genetic architecture of seed FAAs in maize (Wang and Larkins, 2001; Wu et al., 2002; Pineda-Hidalgo et al., 2011). We also mined QTLs for three major EAAs (Lys, Met, and Trp) in maize kernels (Gutiérrez-Rojas et al., 2010). For the significant QTLs, we extracted the flanking and peak markers from the respective articles. We also extracted the bin positions of the respective peak QTL markers from the articles as well as from the MaizeGDB (Lawrence et al., 2004). We converted the bin position to the physical position in base pairs (RefGen_V2) from MaizeGDB (Lawrence et al., 2004) to identify the genes that overlapped with the 80 HCCG list and the previously identified FAA QTL intervals.
Accession numbers
Protein identification and corresponding data from this article can be found in the Supplemental Data Sets S6, S7, and S8. Taxa names can be found in the Supplemental Data Set S1 and gene names and annotation can be found in the Supplemental Data Set S4.
Supplemental data
The following materials are available in the online version of this article.
Supplemental Figure S1 . Genomic distribution of the SNPs across the genome.
Supplemental Figure S2 . Analysis of network topology for soft thresholding power for constructing the coexpression network.
Supplemental Figure S3 . Gene expression of Cluster 1 members in the various maize tissues.
Supplemental Figure S4 . Haploblock and haplotype summary of the genomic region that surrounds ranked gene 9 related to Met/Total.
Supplemental Figure S5 . Haploblock and haplotype summary of the genomic region that surrounds ranked gene 13 related to Met.
Supplemental Table S1 . Statistical and heritability summary of the PBAA absolute and relative composition traits.
Supplemental Table S2 . Correlation between the absolute PBAA levels and their relative composition.
Supplemental Table S3 . List of 76 seed PBAA traits used for GWAS.
Supplemental Table S4 . Enrichment analysis of the five protein coexpression modules.
Supplemental Table S5 . The 80 HCCG STRING and functional analysis.
Supplemental Table S6 . eQTL-mQTL analysis.
Supplemental Data Set S1 . The backtransformed BLUPs of the 76 PBAA traits.
Supplemental Data Set S2 . Seed size (50 kernel weight in gram) of 282 maize diversity panel grown in two replicates each in the Years 2017 and 2018.
Supplemental Data Set S3 . r and corresponding P-value (sorted lowest to highest) between 76 PBAA related traits and the seed size (50 kernels weight in gram).
Supplemental Data Set S4 . GWAS summary of the 76 PBAA related traits.
Supplemental Data Set S5 . Seed PBAA composition during development and maturation.
Supplemental Data Set S6 . Summary of the proteomic analysis of seed development and maturation.
Supplemental Data Set S7 . The summary of the protein ID, their corresponding kME values, and their description.
Supplemental Data Set S8 . Summary of the 80 HCCG ranking.
Supplemental Data Set S9 . Comprehensive summary of the haploblock and haplotypes analysis and pairwise LD of the top 15 ranked genes.
Supplemental Data Set S10 . Comparative analysis of the FAA QTLs from the literature with the PBAA 80 HCCG list.
Supplemental Data Set S11 . Genotype × Year interaction summary of 76 PBAA related traits using ANOVA and the mixed model analysis.
Supplementary Material
Acknowledgments
The authors wish to acknowledge Melody Kroll for assistance with editing the manuscript, Tim Beissinger for statistical advice, and James Elder, Clement Bagaza, and Lauren Jenkins for assistance in propagating, managing, and postharvest processing of the maize diversity panel.
Funding
This study was funded by the National Science Foundation 1355406 grant (EPSCoR; The Missouri Transect, Climate, Plants, and Community). This research was supported in part by the U.S. Department of Agriculture, Agricultural Research Service.
Conflict of interest statement . Authors have no conflict of interest to declare.
V.S. performed the experiments, wrote the manuscript, and processed and analyzed data, A.Y. wrote the manuscript and carried out metabolic analysis, M.L.S. analyzed data, Y.O.C. analyzed data, S.H. carried out field experimentation, A.G. analyzed data and revised manuscript, S.F.G. provided the germplasm, supervised the work, and critically revised the manuscript, A.E.L. supervised the work and assisted with statistical aid, and R.A. conceived the experimental design, supervised the work, and wrote the manuscript. All authors have reviewed the final version of the manuscript and approved it and therefore are equally responsible for the integrity and accuracy of its content.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (https://academic.oup.com/plphys/pages/general-instructions) is Ruthie Angelovici (angelovicir@missouri.edu).
References
- Akula N, Baranova A, Seto D, Solka J, Nalls MA, Singleton A, Ferrucci L, Tanaka T, Bandinelli S, Cho YS, et al. (2011) A network-based approach to prioritize results from genome-wide association studies. PLoS One 6: e24220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Altenbach SB, Tanaka CK, Allen PV (2014) Quantitative proteomic analysis of wheat grain proteins reveals differential effects of silencing of omega-5 gliadin genes in transgenic lines. J Cereal Sci 59: 118–125 [Google Scholar]
- Angelovici R, Batushansky A, Deason N, Gonzalez-Jorge S, Gore MA, Fait A, DellaPenna D (2016) Network-guided GWAS improves identification of genes affecting free amino acids. Plant Physiol 173: 872–886 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Angelovici R, Lipka AE, Deason N, Gonzalez-Jorge S, Lin H, Cepela J, Buell R, Gore MA, DellaPenna D (2013) Genome-wide analysis of branched-chain amino acid levels in Arabidopsis seeds. The Plant Cell 25: 4827–4843 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bailey-Serres J, Juntawong P (2012) Dynamic light regulation of translation status in Arabidopsis thaliana. Fron Plant Sci 3: 66. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, Pelletier D, Wu W, Uitdehaag BM, Kappos L, Consortium G (2009) Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 18: 2078–2090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barrett JC, Fry B, Maller J, Daly MJ (2004) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265 [DOI] [PubMed] [Google Scholar]
- Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Royal Stat Soc Ser B (Methodol) 57: 289–300 [Google Scholar]
- Boston RS, Larkins BA (2009) The genetics and biochemistry of maize zein storage proteins. InHandbook of Maize. Springer, Berlin, Germany, pp 715–730 [Google Scholar]
- Branco-Price C, Kawaguchi R, Ferreira RB, Bailey-Serres J (2005) Genome-wide analysis of transcript abundance and translation in Arabidopsis seedlings subjected to oxygen deprivation. Ann Bot 96: 647–660 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Branco-Price C, Kaiser KA, Jang CJ, Larive CK, Bailey-Serres J (2008) Selective mRNA translation coordinates energetic and metabolic adjustments to cellular oxygen deprivation and reoxygenation in Arabidopsis thaliana. Plant J 56: 743–755 [DOI] [PubMed] [Google Scholar]
- Browning KS, Bailey-Serres J (2015) Mechanism of cytoplasmic mRNA translation. Arabidopsis Book 13: e0176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chan EK, Rowe HC, Corwin JA, Joseph B, Kliebenstein DJ (2011) Combining genome-wide association mapping and transcriptional networks to identify novel genes controlling glucosinolates in Arabidopsis thaliana. PLoS Biol 9: e1001125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J, Zeng B, Zhang M, Xie S, Wang G, Hauck A, Lai J (2014) Dynamic transcriptome landscape of maize embryo and endosperm development. Plant Physiol 166: 252–264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ching A, Caldwell KS, Jung M, Dolan M, Smith OSH, Tingey S, Morgante M, Rafalski AJ (2002) SNP frequency, haplotype structure and linkage disequilibrium in elite maize inbred lines. BMC Genetics 3: 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cook JP, McMullen MD, Holland JB, Tian F, Bradbury P, Ross-Ibarra J, Buckler ES, Flint-Garcia SA (2012) Genetic architecture of maize kernel composition in the nested association mapping and inbred association panels. Plant Physiol 158: 824–834 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Deng M, Li D, Luo J, Xiao Y, Liu H, Pan Q, Zhang X, Jin M, Zhao M, Yan J (2017) The genetic architecture of amino acids dissection by association and linkage analysis in maize. Plant Biotechnol J 15: 1250–1263 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, Mitchell SE (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS One 6: e19379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- FAOSTAT (2018) Food and Agriculture Organization of the United Nations. www.fao.org/faostat/en/#rankings/countires_by_commodity.
- Flint-Garcia SA, Bodnar AL, Scott MP (2009) Wide variability in kernel composition, seed characteristics, and zein profiles among diverse maize inbreds, landraces, and teosinte. Theor Appl Genet 119: 1129–1142 [DOI] [PubMed] [Google Scholar]
- Flint-Garcia SA, Thornsberry JM, Buckler IV ES (2003) Structure of linkage disequilibrium in plants. Ann Rev Plant Biol 54: 357–374 [DOI] [PubMed] [Google Scholar]
- Flint-Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM, Mitchell SE, Doebley J, Kresovich S, Goodman MM, Buckler ES (2005) Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J 44: 1054–1064 [DOI] [PubMed] [Google Scholar]
- Gamm M, Peviani A, Honsel A, Snel B, Smeekens S, Hanson J (2014) Increased sucrose levels mediate selective mRNA translation in Arabidopsis. BMC Plant Biol 14: 306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Garcia-Alonso L, Alonso R, Vidal E, Amadoz A, de Maria A, Minguez P, Medina I, Dopazo J (2012) Discovering the hidden sub-network component in a ranked list of genes or proteins derived from genomic experiments. Nucleic Acids Res 40: e158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibbon BC, Larkins BA (2005) Molecular genetic approaches to developing quality protein maize. Trends Genet 21: 227–233 [DOI] [PubMed] [Google Scholar]
- Gorissen SH, Crombag JJ, Senden JM, Waterval WH, Bierau J, Verdijk LB, van Loon LJ (2018) Protein content and amino acid composition of commercially available plant-based protein isolates. Amino Acids 50: 1685–1695 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo X, Yuan L, Chen H, Sato SJ, Clemente TE, Holding DR (2013) Nonredundant function of zeins and their correct stoichiometric ratio drive protein body formation in maize endosperm. Plant Physiol 162: 1359–1369 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutierrez-Rojas A, Betran J, Scott MP, Atta H, Menz M (2010) Quantitative trait loci for endosperm modification and amino acid contents in quality protein maize. Crop Sci 50: 870–879 [Google Scholar]
- Habben JE, Kirleis AW, Larkins BA (1993) The origin of lysine-containing proteins in opaque-2 maize endosperm. Plant Mol Biol 23: 825–838 [DOI] [PubMed] [Google Scholar]
- Habben JE, Moro GL, Hunter BG, Hamaker BR, Larkins BA (1995) Elongation factor 1 alpha concentration is highly correlated with the lysine content of maize endosperm. Proc Natl Acad Sci 92: 8640–8644 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hawkins RD, Hon GC, Ren B (2010) Next-generation genomics: an integrative approach. Nat Rev Genet 11: 476–486 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holland JB, Nyquist WE, Cervantes-Martínez CT (2003) Estimating and interpreting heritability for plant breeding: an update. In Plant Breeding Reviews, Vol 22. John Wiley & Sons, Hoboken, NJ
- Hunter BG, Beatty MK, Singletary GW, Hamaker BR, Dilkes BP, Larkins BA, Jung R (2002) Maize opaque endosperm mutations create extensive changes in patterns of gene expression. Plant Cell 14: 2591–2612 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hurkman WJ, Tanaka CK (1986) Solubilization of plant membrane proteins for analysis by two-dimensional gel electrophoresis. Plant Physiol 81: 802–806 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia M, Wu H, Clay KL, Jung R, Larkins BA, Gibbon BC (2013) Identification and characterization of lysine-rich proteins and starch biosynthesis genes in the opaque2mutant by transcriptional and proteomic analysis. BMC Plant Biol 13: 60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia P, Zheng S, Long J, Zheng W, Zhao Z (2011) dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics 27: 95–102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia S, Yobi A, Naldrett MJ, Alvarez S, Angelovici R, Zhang C, Holding DR (2020) Deletion of maize RDM4 suggests a role in endosperm maturation as well as vegetative and stress-responsive growth. J Exp Bot 71: 5880–5895 [DOI] [PubMed] [Google Scholar]
- Kawaguchi R, Bailey-Serres J (2002) Regulation of translational initiation in plants. Curr Opin Plant Biol 5: 460–465 [DOI] [PubMed] [Google Scholar]
- Kawaguchi R, Girke T, Bray EA, Bailey-Serres J (2004) Differential mRNA translation contributes to gene regulation under non-stress and dehydration stress conditions in Arabidopsis thaliana. Plant J 38: 823–839 [DOI] [PubMed] [Google Scholar]
- Kliebenstein DJ (2020) Using networks to identify and interpret natural variation. Curr Opin Plant Biol 54: 122–126 [DOI] [PubMed] [Google Scholar]
- Kolde R, Kolde MR (2015) Package ‘pheatmap’. R Package 1: 790 [Google Scholar]
- Kremling KA, Chen SY, Su MH, Lepak NK, Romay MC, Swarts KL, Lu F, Lorant A, Bradbury PJ, Buckler ES (2018) Dysregulation of expression correlates with rare-allele burden and fitness loss in maize. Nature 555: 520. [DOI] [PubMed] [Google Scholar]
- La T, Large E, Taliercio E, Song Q, Gillman JD, Xu D, Nguyen HT, Shannon G, Scaboo A (2019) Characterization of select wild soybean accessions in the USDA germplasm collection for seed composition and agronomic traits. Crop Sci 59: 233–251 [Google Scholar]
- Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Syst Biol 1: 54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9: 559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langfelder P, Zhang B, Horvath S (2007) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24: 719–720 [DOI] [PubMed] [Google Scholar]
- Larkins BA (2017) Maize Kernel Development. CABI, Wallingford
- Larkins BA, Hurkman WJ (1978) Synthesis and deposition of zein in protein bodies of maize endosperm. Plant Physiol 62: 256–263 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larkins BA, Pedersen K, Marks MD, Wilson DR (1984) The zein proteins of maize endosperm. Trends Biochem Sci 9: 306–308 [Google Scholar]
- Lawrence CJ, Dong Q, Polacco ML, Seigfried TE, Brendel V (2004) MaizeGDB, the community database for maize genetics and genomics. Nucleic Acids Res 32: D393–D397 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lending CR, Larkins BA (1989) Changes in the zein composition of protein bodies during maize endosperm development. Plant Cell 1: 1011–1023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipka AE, Gore MA, Magallanes-Lundback M, Mesberg A, Lin H, Tiede T, Chen C, Buell CR, Buckler ES, Rocheford T (2013) Genome-wide association study and pathway level analysis of tocochromanol levels in maize grain. G3 3: 1287–1299 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, Gore MA, Buckler ES, Zhang Z (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28: 2397–2399 [DOI] [PubMed] [Google Scholar]
- Liu MJ, Wu SH, Wu JF, Lin WD, Wu YC, Tsai TY, Tsai HL, Wu SH (2013) Translational landscape of photomorphogenic Arabidopsis. Plant Cell 25: 3699–3710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu X, Huang M, Fan B, Buckler ES, Zhang Z (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genetics 12: e1005767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lohse M, Nagel A, Herter T, May P, Schroda M, Zrenner R, Tohge T, Fernie AR, Stitt M, Usadel B (2014) M ercator: a fast and simple web server for genome scale functional annotation of plant sequence data. Plant Cell Environ 37: 1250–1258 [DOI] [PubMed] [Google Scholar]
- Ma Z, Dooner HK (2004) A mutation in the nuclear-encoded plastid ribosomal protein S9 leads to early embryo lethality in maize. Plant J 37: 92–103 [DOI] [PubMed] [Google Scholar]
- Martinez-Seidel F, Beine-Golovchuk O, Hsieh YC, Kopka J (2020) Systematic review of plant ribosome heterogeneity and specialization. Front Plant Sci 11: 948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Merchante C, Stepanova AN, Alonso JM (2017) Translation regulation in plants: an interesting past, an exciting present and a promising future. Plant J 90: 628–653 [DOI] [PubMed] [Google Scholar]
- Messing J (1983) The manipulation of zein genes to improve the nutritional value of corn. Trends Biotechnol 1: 54–59 [Google Scholar]
- Miclaus M, Wu Y, Xu J-H, Dooner HK, Messing J (2011) The maize high-lysine mutant opaque7 is defective in an acyl-CoA synthetase-like protein. Genetics 189: 1271–1280 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Missra A, Ernest B, Lohoff T, Jia Q, Satterlee J, Ke K, von Arnim AG (2015) The circadian clock modulates global daily cycles of mRNA ribosome loading. Plant Cell 27: 2582–2599 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Morton KJ, Jia S, Zhang C, Holding DR (2015) Proteomic profiling of maize opaque endosperm mutants reveals selective accumulation of lysine-enriched proteins. J Exp Bot 67: 1381–1396 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pandurangan S, Pajak A, Molnar SJ, Cober ER, Dhaubhadel S, Hernández-Sebastià C, Kaiser WM, Nelson RL, Huber SC, Marsolais F (2012) Relationship between asparagine metabolism and protein concentration in soybean seed. J Exp Bot 63: 3173–3184 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pineda-Hidalgo KV, Lavin-Aramburo M, Salazar-Salas NY, Chavez-Ontiveros J, Reyes-Moreno C, Muy-Rangel MD, Larkins BA, Lopez-Valenzuela JA (2011) Characterization of free amino acid QTLs in maize opaque2 recombinant inbred lines. J Cereal Sci 53: 250–258 [Google Scholar]
- Prioul JL, Méchin V, Lessard P, Thévenot C, Grimmer M, Chateau-Joubert S, Coates S, Hartings H, Kloiber-Maitz M, Murigneux A (2008) A joint transcriptomic, proteomic and metabolic analysis of maize endosperm development and starch filling. Plant Biotechnol J 6: 855–869 [DOI] [PubMed] [Google Scholar]
- Qi Z, Zhang Z, Wang Z, Yu J, Qin H, Mao X, Jiang H, Xin D, Yin Z, Zhu R (2018) Meta-analysis and transcriptome profiling reveal hub genes for soybean seed storage composition during seed development. Plant Cell Environ 41: 2109–2127 [DOI] [PubMed] [Google Scholar]
- Rabideau GS (1954) Amino acids in the embryo and endosperm of the grain of different varieties of corn. Bot Gazette 115: 391–394 [Google Scholar]
- Reynoso MA, Blanco FA, Bailey-Serres J, Crespi M, Zanetti ME (2013) Selective recruitment of m RNA s and mi RNA s to polyribosomes in response to rhizobia infection in M edicago truncatula. Plant J 73: 289–301 [DOI] [PubMed] [Google Scholar]
- Sabelli PA, Larkins BA (2009) The development of endosperm in grasses. Plant Physiol 149: 14–26 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sáez-Vásquez J, Delseny M (2019) Ribosome biogenesis in plants: from functional 45S ribosomal DNA organization to ribosome assembly factors. Plant Cell 31: 1945–1967 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schaefer RJ, Michno J-M, Jeffers J, Hoekenga O, Dilkes B, Baxter I, Myers CL (2018) Integrating coexpression networks with GWAS to prioritize causal genes in maize. Plant Cell 30: 2922–2942 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt M, Barbazuk WB, Sandford M, May GD, Song Z, Zhou W, Nikolau BJ, Herman EM (2011) Silencing of soybean seed storage proteins results in a rebalanced protein composition preserving seed protein content without major collateral changes in the metabolome and transcriptome. Plant Physiol 156: 330–345 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmidt MA, Pendarvis K (2017) Proteome rebalancing in transgenic Camelina occurs within the enlarged proteome induced by β-carotene accumulation and storage protein suppression. Transgenic Res 26: 171–186 [DOI] [PubMed] [Google Scholar]
- Sekhon RS, Saski C, Kumar R, Flinn BS, Luo F, Beissinger TM, Ackerman AJ, Breitzman MW, Bridges WC, de Leon N. et al. (2019) Integrated genome-scale analysis identifies novel genes and networks underlying senescence in maize. Plant Cell 31: 1968–1989 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shamimuzzaman M, Vodkin L (2014) Transcription factors and glyoxylate cycle genes prominent in the transition of soybean cotyledons to the first functional leaves of the seedling. Funct Integr Genomics 14: 683–696 [DOI] [PubMed] [Google Scholar]
- Shen B, Roesler K (2017) Maize kernel oil content. In Maize Kernel Development. CABI, Wallingford, pp 160–174
- Shen L (2014) Gene overlap: an R package to test and visualize gene overlaps. R Package
- Shewry PR (2007) Improving the protein content and composition of cereal grain. J Cereal Sci 46: 239–250 [Google Scholar]
- Shewry PR, Halford NG (2002) Cereal seed storage proteins: structures, properties and role in grain utilization. J Exp Bot 53: 947–958 [DOI] [PubMed] [Google Scholar]
- Shi Z, Fujii K, Kovary KM, Genuth NR, Röst HL, Teruel MN, Barna M (2017) Heterogeneous ribosomes preferentially translate distinct subpools of mRNAs genome-wide. Mol Cell 67: 71–83.e77 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shiferaw B, Prasanna BM, Hellin J, Bänziger M (2011) Crops that feed the world 6. Past successes and future challenges to the role played by maize in global food security. Food Security 3: 307 [Google Scholar]
- Slaten ML, Chan YO, Shrestha V, Lipka AE, Angelovici R (2020a) HAPPI GWAS: holistic analysis with pre and post integration GWAS. Bioinformatics 36: 4655–4657 [DOI] [PubMed] [Google Scholar]
- Slaten ML, Yobi A, Bagaza C, Chan YO, Shrestha V, Holden S, Katz E, Kanstrup C, Lipka AE, Kliebenstein DJ (2020b) mGWAS uncovers Gln-Glucosinolate seed-specific interaction and its role in metabolic homeostasis. Plant Physiol 183: 483–500 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sormani R, Masclaux-Daubresse C, Daniele-Vedele F, Chardon F (2011) Transcriptional regulation of ribosome components are determined by stress according to cellular compartments in Arabidopsis thaliana. PLoS One 6: e28070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stelpflug SC, Sekhon RS, Vaillancourt B, Hirsch CN, Buell CR, de Leon N, Kaeppler SM (2016) An expanded maize gene expression atlas based on RNA sequencing and its use to explore root development. Plant Genome 9: 1–16 [DOI] [PubMed] [Google Scholar]
- Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, Simonovic M, Doncheva NT, Morris JH, Bork P (2019) STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic acids research 47: D607–D613 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tian T, Liu Y, Yan H, You Q, Yi X, Du Z, Xu W, Su Z (2017) agriGO v2. 0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res 45: W122–W129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tzafrir I, Pena-Muralla R, Dickerman A, Berg M, Rogers R, Hutchens S, Sweeney TC, McElver J, Aux G, Patton D (2004) Identification of genes required for embryo development in Arabidopsis. Plant Physiol 135: 1206–1220 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van K, McHale LK (2017) Meta-analyses of QTLs associated with protein and oil contents and compositions in soybean [Glycine max (L.) Merr.] seed. Int J Mol Sci 18: 1180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vasal S (2000) The quality protein maize story. Food Nutr Bull 21: 445–450 [Google Scholar]
- Wang X, Larkins BA (2001) Genetic analysis of amino acid accumulation inopaque-2 maize endosperm. Plant Physiol 125:1766–1777 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Watson S (2003) Description, development, structure and composition of the corn kernel. Corn Chem Technol 2: 69–106 [Google Scholar]
- Wickham H, Chang W, Wickham MH. ( 2016) Package ‘ggplot2’. Create elegant data visualisations using the grammar of graphics. Version 2: 1–189 [Google Scholar]
- Withana-Gamage TS, Hegedus DD, Qiu X, Yu P, May T, Lydiate D, Wanasundara JP (2013) Characterization of Arabidopsis thaliana lines with altered seed storage protein profiles using synchrotron-powered FT-IR spectromicroscopy. J Agric Food Chem 61: 901–912 [DOI] [PubMed] [Google Scholar]
- Wu R, Lou XY, Ma CX, Wang X, Larkins BA, Casella G (2002) An improved genetic model generates high-resolution mapping of QTL for protein quality in maize endosperm. Proc Natl Acad Sci 99: 11281–11286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu S, Alseekh S, Cuadros-Inostroza Á, Fusari CM, Mutwil M, Kooke R, Keurentjes JB, Fernie AR, Willmitzer L, Brotman Y (2016) Combined use of genome-wide association data and correlation networks unravels key regulators of primary metabolism in Arabidopsis thaliana. PLoS Genet 12: e1006363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Messing J (2014) Proteome balancing of the maize seed for higher nutritional value. Front Plant Sci 5: 240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Y, Wang W, Messing J (2012) Balancing of sulfur storage in maize seed. BMC Plant Biol 12: 77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan H, Chen D, Wang Y, Sun Y, Zhao J, Sun M, Peng X (2016) Ribosomal protein L18aB is required for both male gametophyte function and embryo development in Arabidopsis. Sci Rep 6: 31195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan J, Shah T, Warburton ML, Buckler ES, McMullen MD, Crouch J. ( 2009) Genetic characterization and linkage disequilibrium estimation of a global maize collection using SNP markers. PLoS One 4: e8451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yangueez E, Castro-Sanz AB, Fernandez-Bautista N, Oliveros JC, Castellano MM (2013) Analysis of genome-wide changes in the translatome of Arabidopsis seedlings subjected to heat stress. PLoS One 8: e71425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yobi A, Angelovici R (2018) A high-throughput absolute-level quantification of protein-bound amino acids in seeds. Curr Protoc Plant Biol 3: e20084. [DOI] [PubMed] [Google Scholar]
- Yobi A, Bagaza C, Batushansky A, Shrestha V, Emery ML, Holden S, Turner-Hissong S, Miller ND, Mawhinney TP, Angelovici R (2020) The complex response of free and bound amino acids to water stress during the seed setting stage in Arabidopsis. Plant J 102: 838–855 [DOI] [PubMed] [Google Scholar]
- Young TE, Gallie DR, DeMason DA (1997) Ethylene-mediated programmed cell death during maize endosperm development of wild-type and shrunken2 genotypes. Plant Physiol 115: 737–751 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Yuan H, Yang Y, Fish T, Lyi SM, Thannhauser TW, Zhang L, Li L (2016) Plastid ribosomal protein S5 is involved in photosynthesis, plant development, and cold stress tolerance in Arabidopsis. J Exp Bot 67: 2731–2744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng X, Li Q, Li C, An D, Xiao Q, Wang W, Wu Y (2019) Intra-Kernel reallocation of proteins in maize depends on VP1-mediated scutellum development and nutrient assimilation. Plant Cell 31: 2613–2635 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






