Skip to main content
International Journal of Molecular Sciences logoLink to International Journal of Molecular Sciences
. 2022 May 4;23(9):5121. doi: 10.3390/ijms23095121

Deciphering Pleiotropic Signatures of Regulatory SNPs in Zea mays L. Using Multi-Omics Data and Machine Learning Algorithms

Ataul Haleem 1,2, Selina Klees 1,3, Armin Otto Schmitt 1,3, Mehmet Gültas 2,3,*
Editor: Shaozhen He
PMCID: PMC9100765  PMID: 35563516

Abstract

Maize is one of the most widely grown cereals in the world. However, to address the challenges in maize breeding arising from climatic anomalies, there is a need for developing novel strategies to harness the power of multi-omics technologies. In this regard, pleiotropy is an important genetic phenomenon that can be utilized to simultaneously enhance multiple agronomic phenotypes in maize. In addition to pleiotropy, another aspect is the consideration of the regulatory SNPs (rSNPs) that are likely to have causal effects in phenotypic development. By incorporating both aspects in our study, we performed a systematic analysis based on multi-omics data to reveal the novel pleiotropic signatures of rSNPs in a global maize population. For this purpose, we first applied Random Forests and then Markov clustering algorithms to decipher the pleiotropic signatures of rSNPs, based on which hierarchical network models are constructed to elucidate the complex interplay among transcription factors, rSNPs, and phenotypes. The results obtained in our study could help to understand the genetic programs orchestrating multiple phenotypes and thus could provide novel breeding targets for the simultaneous improvement of several agronomic traits.

Keywords: multi-omics, regulatory SNPs, incremental feature selection, random forest, markov clustering, hierarchical network model, gene expression profiles

1. Introduction

Maize is an exceptional source of food, feed and fuel. It has become one of the most important cereal crops that feed the world by contributing 30% of the food calories for 4.5 billion people [1]. Over the past decade, maize production has increased remarkably by more than 1.16 million tons (FAOSTAT), but there is still a need for further yield increases to offset the food insecurity caused by the exponential increase in the world’s population. However, the dramatic fluctuations in global mean temperatures observed over the past decades pose a serious threat to sustainable crop production and demand better strategies of crop improvement.

To deal with above-mentioned issues, several biochemical, physiological and morphological traits of maize (such as grain yield and biomass) have been of individual importance in breeding research [2,3,4,5,6,7,8,9,10,11,12,13]. For this purpose, several association studies have been conducted for marker-assisted selection of superior genotypes by applying conventional genome-wide association studies (GWAS), which could provide essential information about the genetic architecture of genotype × phenotype interactions [14]. Despite the rich literature on GWAS and their application in plant breeding, they are still criticized for high false positive rates [15,16,17], requirement of large sample sizes for the detection of rare alleles [18] and missing heritability [19]. In order to overcome these limitations to certain extents, machine learning (ML) approaches, like Random Forests (RFs) or convolutional neural networks (CNNs), have been successfully applied to large genomic data sets, which employ non-parametric methods to decipher genotype × phenotype interactions [17,20,21,22,23,24,25,26]. Especially, recent studies [16,17,27,28] have demonstrated the utility of RF based-models for the analysis of a large number of loci and the identification of promising SNP candidates having strong associations with phenotypes. For example, Klees et al. [20] recently applied the RF approach to identify associations between the rapeseed oil content and regulatory SNPs (rSNPS), which are located in the promoter regions of genes and have a strong impact on the binding sites of transcription factors (TFs), thus affecting the development of phenotypes.

Another fundamental aspect of single-SNP-based studies (including rSNPs) is their utility to detect associations between a SNP and multiple phenotypes. Such type of associations of a single-SNP or a gene is referred to as pleiotropy [29,30,31], which, by definition, is a phenomenon of having a single genetic variant responsible for multiple phenotypes. Given the importance of pleiotropy, the aspect of investigating rSNPs could be quintessential for understanding the influence of variation in quantitative traits and their improvement against biotic and abiotic stresses. However, a limited number of studies have reported pleiotropic effects in maize [32,33,34,35,36,37,38,39,40]. The lack of such type of studies in maize can be compensated using modern multi-omics technologies, which have enabled the researchers to rapidly sequence large breeding populations and measure several phenotypes [41], facilitating pleiotropic studies. In particular, transcriptomics, proteomics, genomics, and phenomics are increasingly being used in plant sciences to gain a comprehensive understanding of complex genetic traits [20,42].

Recently, Liu et al. [43] generated a comprehensive multi-omics dataset of global maize germplasm, comprising genomic, transcriptomic and multiple phenotypic data of 368 maize inbred lines representing stiff-stock, non-stiff-stock, tropical, semi-tropical and mixed backgrounds [44] to identify genome-wide associations between individual SNPs and phenotypes [45,46,47,48,49].

Leveraging these multi-omics data, the main objectives of our study are to identify pleiotropic signatures of rSNPs in a systematic analysis and to construct hierarchical network models that could lead to new hypotheses to determine the crucial role of TFs controlling the development of different phenotypes in maize breeding research. Our analysis pipeline primarily consists of four distinct phases. In the first phase, rSNPs are identified, while in the second phase, the RF algorithm is used to identify relative importance of individual rSNP in phenotype associations [17,20]. Based on these association results, we identify pleiotropic rSNPs in the third phase using the Markov clustering algorithm [50], which is followed by the construction of hierarchical network models in the fourth phase that elucidate the complex interplay among TFs, rSNPs, and multiple phenotypes. Our findings demonstrate that systematic analysis of multi-omics data of global maize populations: (i) enables the identification of pleiotropic signatures of rSNPs along with their consequences on TF binding sites and; (ii) provides new insights into the genetic architecture and new breeding targets for the corresponding multiple phenotypes in maize.

2. Materials and Methods

In this section, we describe the multi-omics dataset analyzed and the methods applied in this study. Our analysis framework is structured as shown in the Figure 1. In particular, we start with the preprocessing of multi-omics data of 368 maize inbred lines and their systematic analysis towards the identification of pleiotropic signatures of rSNPs. For this purpose, we first identified the rSNPs from the genotype dataset by applying the MATCH™ algorithm [51] together with a non-redundant plant position weight matrix (PWM) library obtained from the TRANSFAC database [52]. Second, the Random forest algorithm [53] was applied together with its specific feature selection wrapper, the Boruta algorithm [54], to assess the relative importance of each rSNP in terms of its involvement in the characterization of 20 agronomic phenotypes under study. This step was followed by the incremental feature selection (IFS) procedure [55,56] to find the optimal list of associated rSNPs for each phenotype. Next, using the Markov clustering algorithm (MCL) algorithm [50] the pleiotropic relationship signatures of rSNPs were uncovered, as suggested in [57]. Finally, we constructed hierarchical network models to elucidate the complex interplay among TFs, rSNPs, and multiple phenotypes by incorporating the corresponding transcriptome dataset to evaluate the importance of pleiotropic rSNPs. Detailed information on these analysis steps are given in Section 2.2 from Phase 1 to 4.

Figure 1.

Figure 1

Overview of the analyses pipeline highlighting key machine learning algorithms for the identification of pleiotropic signatures of regulatory SNPs (rSNPs) to establish complex interplay of transcription factors (TFs), rSNPs and multiple phenotypes. The genotypic data (A1), consisting of 1.03 m SNP markers was filtered for MAF (<0.05) and 31,000 SNPs found within promoter regions of 37,407 maize genes were considered for association analysis with 20 quantitative agronomic traits (A2). RNA-seq (A3) dataset was utilized for the validation of pleiotropic rSNPs on the underlying gene expression. As of first step in the data analysis, rSNPs were identified (B) for their impact on the gain or loss of TFBSs, after which their association with multiple phenotypes was determined using random forest (RF) using the Boruta algorithm and incremental feature selection (IFS) technique (C). Pleiotropic signatures of rSNPs were then established by pruning weaker connections in the overall network into smaller non-overlapping fully connected clusters, using Markov clustering (MCL) algorithm (D) which provided the basis for the construction of hierarchical network models with three distinct layers modelling the complex interplay of TFs, rSNPs and multiple phenotype (B). Further, the boxplots show the impact of pleiotropic rSNPs at gene expression level as a function of gain or loss of TFBSs (E).

2.1. Multi-Omics Data

2.1.1. Genotype Dataset

The genotypic data of 368 inbred lines was obtained from: (i) CIMMYT; (ii) the Germplasm Enhancement of Maize (GEM) project in the USA; (iii) temperate and tropical/subtropical breeding programs in China. The inbred lines represent non-stiff stock, stiff stock, mixed, tropical and semi-tropical backgrounds. The genotyping has been performed using the MaizeSNP50 BeadChip with 56,110 SNP markers [58] which was further incremented to 1.03 million SNP markers using deep RNA-Seq data [43]. Similar to previous studies [47,59], SNPs with an MAF <0.05 are discarded, and genomic coordinates for SNP markers are lifted over from reference genome V2 to V5 using CrossMap [60]. Consequently, after filtering, the genotype dataset comprises 31,934 SNPs for 368 maize lines which are located on the chromosomes 1 to 10 including 37,407 genes.

2.1.2. Phenotype Dataset

The corresponding phenotypic data of the maize lines was collected in 2009 and 2010 in five different environments in China. The experimental units were completely randomized with a row length of 3 m, 11 plants per row, 25 cm spacing between plants, and 60 cm spacing between rows. The mean observed value of five randomly selected plants for 20 agronomic (quantitative) traits was taken, which additionally were converted into the best linear unbiased predictions (BLUPs) [40,48,49].

2.1.3. Transcriptome Dataset

Paired-end deep RNA-seq data for the 368 inbred lines, generated by Liu et al. [43], was retrieved from the European Nucleotide Archive (ENA) browser (study accession: PRJNA208608). Raw sequencing data were adapter and quality trimmed using Trim Galore [61]. High-quality reads were then mapped to the Zea mays L. reference genome, V5 (available at https://download.maizegdb.org/Zm-B73-REFERENCE-NAM-5.0/Zm-B73-REFERENCE-NAM-5.0.fa.gz) (accessed on 16 April 2021), using STAR (v2.7.3a) [62]. Raw read counts for each transcript were obtained using HTSeq [63] and normalized using the median-of-ratios normalization method implemented in the R package DESeq2 [64].

2.2. Data Analysis

Our analysis framework consists of four phases to decipher the complex interplay among transcription factors (TFs), regulatory SNPs (rSNPs) and multiple phenotypes using the multi-omics dataset under study.

Phase 1: We identified rSNPs from the genotype dataset by applying our analysis pipeline introduced in [14], which consists of the following steps. First, considering the promoter region of genes (−500 bp to +100 bp relative to the transcription start site), all SNPs in these regions were selected. Second, we extracted the flanking sequence of each selected SNP, which covers ±25 bp relative to the SNP position. In total, each sequence is 51 bp long and the SNP is located in the central position. Then, two copies of the extracted sequences were constructed: while the first copy contains the reference allele at the SNP position, the second has the alternate allele. Next, by employing the MATCH™ program [51], we scanned each sequence to predict binding sites of transcription factors with their affinity scores [0,1]. For the application of the MATCH™ program, we used a non-redundant plant position weight matrix (PWM) library obtained from the TRANSFAC database [52]. Finally, mainly focusing on the alterations of transcription factor binding sites (TFBSs) in the sequences of each SNP, we collected its potential consequence as: (i) ”Loss of TFBS”: only the sequence with the reference allele contains the TFBS of a specific transcription factor (TF), but the same TFBS is not found in the sequence with alternate allele; and (ii) “Gain of TFBS”: the TFBS can be found only in the sequence with the alternate allele; and (iii) “No Strong Effect”: indicating that the SNP consequence has either no effect or entails a slight change in binding affinity of TFs. As suggested in the previous studies [14,20,65], we consider in our following analysis a SNP as an rSNP if it leads to a “Gain of TFBS” or a “Loss of TFBS” for at least one TF.

Phase 2: Following the association analysis strategy described in [16,17,20], we used a Random Forest (RF)-based feature selection approach to assess the relative importance of each rSNP in predicting the response variable (phenotype) of interest. To this end, we applied the Boruta algorithm [54], a powerful wrapper designed specifically for the RF-based feature selection technique, to rank the importance of variables (in this case rSNPs). Consequently, by constructing multiple decision trees based on random subsets of features, the Boruta algorithm calculates an importance score for each rSNP and thus provides a ranking.

Using the ranked rSNPs determined by the Boruta algorithm, we further performed the incremental feature selection (IFS) procedure to retrieve the optimal list of features as suggested in [55,56]. During the IFS application, the rSNPs were incrementally added from higher to lower ranks in the ordered feature set, based on which an RF classifier was constructed. The predictive performance of RF was examined based on the R2 values. This enabled us to determine the optimal numbers of associated rSNPs for a certain phenotype of interest (see Figure 2).

Figure 2.

Figure 2

A plot to show the change of R2values versus the number of rSNPs in association with the phenotype pollen shed. The incremental feature selection (IFS) curves were drawn using the ranking of rSNPs. The R2 value reached a peak when considering the first 90 rSNPs. These rSNPs were used for the further analysis of this phenotype.

These analyses were repeated for each of the 20 phenotypes to identify the optimal numbers of the associated rSNPs, that are are given in Table 1.

Table 1.

Phenotypes and the optimal numbers of their associated rSNPs determined by incremental feature selection (IFS) procedure.

Phenotype Max R2 #rSNPs
Leaf number above ear 0.490740 89
Ear leaf width 0.484029 70
Cob diameter 0.445720 64
Ear height 0.509523 109
Kernel width 0.418115 172
Ear leaf length 0.553292 112
Tassel main axis length 0.498562 96
Pollen shed 0.581765 90
Heading date 0.537987 49
Ear length 0.434011 82
Silking time 0.506520 122
Ear diameter 0.481445 110
Cob weight 0.460850 37
X100 grain weight 0.389332 51
Tassel branch number 0.507112 142
Ear row number 0.491663 46
Kernel number per row 0.350717 27
Plant height 0.532837 72
kernel length 0.580691 168
Kernel thickness 0.437589 64

Phase 3: To reveal the unique pleiotropic relationship signatures of rSNPs and thus decipher their complex interplay with TFs and with multiple phenotypes, we applied the Markov clustering algorithm (MCL). MCL is a very effective network-based clustering algorithm that detects distinct groups in a network by eliminating negligible connections (edges) based on their weights [50].

For the detection of pleiotropic relationships, Weighill et al. [57] have successfully applied the MCL algorithm by constructing a profile matrix which represents the SNP × phenotype associations. Following this idea and thus the main concept of MCL, we first created such a profile matrix M, where rows correspond to rSNPs determined by the IFS procedure and columns refer to names of both the phenotypes and TFs. The entry of M at position (i,j), Mij is defined as:

Mij=1ifrSNPiisassociatedwithphenotypej1iftheconsequenceofrSNPiisGainorLossforTFj0otherwise

M was then converted into an rSNP association matrix, An×n (n is the number of rSNPs (rows) in M), using the Proportional Similarity Index [66]. The entry of A at a position (k,l) is calculated between the rSNPs (rows) k and l in M as:

Akl=2·jmin(Mkj,Mlj)j(Mkj+Mlj)

Next, we employed MCL [50] using the matrix A to cluster rSNPs in subgroups based on their similar relationship signatures. Consequently, each of the resulting clusters reflects a collection of rSNPs and their complex interplay with TFs and phenotypes, based on which we designed a hierarchical network model using three layers. These layers are: (i) TFs whose binding site is lost or gained due to the rSNPs; (ii) rSNPs located in the promoter of the genes; and (iii) phenotypes whose development is strongly connected to the expression level of the corresponding genes. In a final step, we carefully removed the rSNPs from the hierarchical network models if they were associated with only one phenotype for ensuring the pleiotropy in the clusters [57].

Phase 4: For the assessment of the consequences of pleiotropic rSNPs on TF-binding activities, which in turn affect the regulation of gene expression and thus the development of the phenotype, we evaluated their potential effects using the corresponding RNA-seq data. For this purpose, we focused only on genes with pleiotropic rSNPs in their promoters. By considering these genes, we divided the 368 maize lines into two groups for each pleiotropic rSNP: While the plants in the first group have the reference allele at the corresponding genomic position, the plants in the second group contain an alternate allele of the rSNP under study. As a result, we compared the gene expression values between those two groups using the Wilcoxon test in order to determine whether the consequence of a pleiotropic rSNP on a certain TF-binding activity (“Gain” or “Loss”) leads to a significant alteration in the expression of the corresponding gene.

In our following analysis, we further pruned the hierarchical network models by removing the rSNPs and corresponding TFs whose binding activities do not result in a significant change in the related gene expression values.

3. Results and Discussion

In this study, we systematically analyzed a multi-omics dataset of 368 maize inbred lines to decipher the complex interplay among TFs, rSNPs and multiple phenotypes. For this purpose, we first identified the rSNPs from genotype dataset and then applied the Boruta algorithm followed by IFS procedure to determine the rSNPs having a strong association with the phenotypes of interest. The number of associated rSNPs along with corresponding phenotypes is given in Table 1 and Figure 3. Next, considering the multi-phenotypic associations of the rSNPs as well as their consequences on TF binding, we employed the MCL algorithm to cluster the rSNPs, which is additionally used to construct hierarchical network models to elucidate the relationship signatures between TFs and rSNPs together with those between rSNPs and multiple phenotypes, indicating their pleiotropic functions.

Figure 3.

Figure 3

Number of associated rSNPs determined by the incremental feature selection (IFS) procedure for each phenotype and their overlap represented in matrix layouts using the UpSet technique [67]. Black circles in the matrix layout are related to the phenotypes that are part of the intersection. For the sake of clarity, not all intersections are displayed.

3.1. Pleiotropic Association Signatures of rSNPs

The application of the MCL algorithm to the rSNPs determined by the IFS procedure (Table 1) results in eleven clusters that reveal the unique pleiotropic relationship signatures of rSNPs arising from their complex interplay with TFs and multiple associated phenotypes. A brief description of the clusters is given in Table 2 and additional information about rSNPs and genes are provided for each cluster in Supplementary File S1.

Table 2.

Result of Markov clustering algorithm (MCL) including the numbers of rSNPs together with their related genes and their associated multiple phenotypes.

Cluster Numbers of Pleiotropic Phenotypes
rSNPs Genes
Cluster-1 15 10 Heading date, Pollen shed, Silking time and Ear height
Cluster-2 9 7 Cob weight, Heading date, Pollen shed, Tassel main axis length, Ear leaf length, Plant height, Ear leaf width, Ear row number and Ear height
Cluster-3 7 6 Kernel length, Kernel thickness, Kernel number per row, Ear diameter and X100 grain weight
Cluster-4 6 5 Ear diameter, Cob diameter and Ear row number
Cluster-5 6 4 Heading date, Pollen shed, Silking time, Ear height and Tassel branch number
Cluster-6 5 3 Ear diameter, Cob diameter and Ear row number
Cluster-7 3 3 Ear height, Plant height and Ear row number
Cluster-8 3 3 Kernel width, Kernel length, Kernel thickness and X100 grain weight
Cluster-9 2 2 Ear leaf length, Leaf number above ear, Kernel length
Cluster-10 2 2 Ear length and Kernel number per row
Cluster-11 2 2 Ear diameter, Tassel main axis length and Cob weight

As shown in Table 2, the individual clusters are quite different regarding the numbers of pleiotropic rSNPs as well as their related phenotypes. The largest cluster contains ten genes (Zm00001eb354560, Zm00001eb403000, Zm00001eb328980, Zm00001eb137870, Zm00001eb366710, Zm00001eb190550, Zm00001eb364380, Zm00001eb156700, Zm00001eb337030, Zm00001eb397560) each associated with four phenotypes. Among these genes, Zm00001eb137870, which encodes the VIN3-like protein 1, is associated with heading date, silking time and pollen shed. Its homologue in arabdopsis, AT3G24440, is known as the Vernalization Insensitive 3-like 1 (VIL1) gene that regulates expression of the Flowering Locus C (FLC:floral repressors) and the Flowering Locus M (FLM) in response to vernalization. VIL1 and VIN3 (Vernalization Insensitive 3) are essential for epigenetic modification of the FLC and FLM loci [68,69]. The VIL1 gene acts upstream of many biological processes including administration and regulation of histone methylation, vernalization response and regulation of flower development. Additionally, it is a negative regulator of gene expression, which is achieved by its role in the positive regulation of histone H3-K27 methylation [70]. VIL1 is also involved in photoperiodism, flowering, vernalization response and response to cold [71]. Other genes in this cluster are known for their involvement in protein metabolism (Zm00001eb354560, Zm00001eb403000, Zm00001eb364380) and fat metabolism (Zm00001eb328980), however, the functionality of the remaining genes in this cluster is currently not known.

Another interesting instance of pleiotropy in action can be seen in Cluster-2, where most of the phenotypes are clustered together, representing a highly complex and coordinated developmental program of the disparate phenotypes in this cluster. The cluster contains genes belonging to protein metabolism (Zm00001eb131510, Zm00001eb288200), carbohydrate and fat metabolism (Zm00001eb372200, Zm00001eb418690), along with flowering time genes (Zm00001eb188340, Zm00001eb418690). Among these genes, Zm00001eb131510 encodes an intracellular protein transporter and is associated with ear leaf length and cob weight. This protein transporter is known for its role in exocytosis, golgi to plasma membrane transport and intracellular protein transport [70]. Its arabidopsis homologue, AT4G02350, is vital for pollen tube growth, pollen germination and acceptance in arabidopsis [72]. Such a gene may play an important role in the source-sink relationship between the developing maize kernels and the flag leaf for translocation of proteins. The arabidopsis homologue, AT1G66430, of the oil content related gene, Zm00001eb372200 is associated with ear row number and ear leaf height in this cluster and is also known for its role in carbohydrate biosynthetic, fatty acid biosynthetic, and fructose metabolic processes [73]. Several genes in this cluster indicates the co-regulation of floral transition and seed development in maize.

A closer look at Cluster-4 reveals that it contains six pleiotropic rSNPs found within five genes (Zm00001eb068110, Zm00001eb140090, Zm00001eb424640, Zm00001eb049750, Zm00001eb057150), associated with only ear traits (ear diameter, cob diameter and ear row number). The genes in this cluster are involved in riboflavin (Zm00001eb068110) [74] and the fatty acid metabolism (Zm00001eb049750) [47], protein relocalization to mitochondrion (Zm00001eb140090) [75] as well as acy carrier activity (Zm00001eb049750) [76]. The gene Zm00001eb049750 encoding the acyl carrier protein is known for its association with maize kernel oil contents [47], while the fax1 (AT3G57280) mutants, a homologue of Zm00001eb424640, are characterized by a decrease in biomass, plant height, stem thickness, reduced male sterility and defective pollen cell wall biosynthesis [77].

Among the smallest clusters are the Clusters 9-11, containing two pleiotropic rSNPs and two corresponding genes each. Additionally there are within the Cluster-10 another two kernel and ear development genes (Zm00001eb213840, Zm00001eb349520). The arabidopsis homologue (AT4G07960.1) of Zm00001eb213840 is known to encode XyG glucan synthase. Arabidopsis mutants of this gene have smaller rosettes and inflorescence stems, weak inflorescence stems and a reduced number of pollen tubes after pollination [78]. Moreover, the product of Zm00001eb349520 is known as a aspartic acid proteinase inhibitor. Today it is well known that high levels of aspartic acids are observed in maize cobs during early reproductive development [79]. Aspartic acids accumulate in kernels as N-assimilates, suggesting its role in kernel growth.

3.2. Construction of Hierarchical Network Models

To further elaborate the complex interplay between pleiotropic rSNPs and TFs by assessing their potential impact on the expression level of the respective genes, we constructed a hierarchical network model for each cluster found by the MCL algorithm. These network models help explain the potential biological functions of TFs in regulating gene expression and, hence, assess the importance of individual pleiotropic rSNPs. They also provide with new hypotheses to advance our knowledge of why the consideration of the TFs could play a crucial role in maize breeding research to understand the genetic mechanisms underlying the development of different phenotypes. An example of our hierarchical network model is presented in Figure 4, showing the complex regulatory circuitry of Cluster-7 phenotypes. The phenotype set in this cluster represented by layer 3 (Figure 4D) comprises ear height, plant height and ear row number, whereas the layer 2 lists three associated rSNPs, three corresponding genes and their pleiotropic associations. Finally, in the first layer the TFs and the change in their binding activity is highlighted, showing loss of binding for ten TFs (in red) and gain of six (in green) TFs. In this cluster, a pleiotropic rSNP in Zm00001eb047590 (opf6 gene) results in the loss of a TFBS for GAMYB but in the gain for a GT1 binding site. The transcription factor GAMYB in arabidopsis is known for its role in Gibberelic acid (GA) signalling regarding floral transition [80,81], whereas the factor GT1 is vital to the regulation of stress tolerance in rice as well as response to light [82,83]. GT1 binds upstream of the light-responsive, rbcS-3A (RUBISCO) gene in peas, hence plays a crucial role in the regulation of photosynthesis [84]. The functional analysis of GT1 in arabidopsis shows its regulation in the target promoters that may have a repressive function in transcription activity [85]. Our findings indicate that the replacement of GAMYB with GT1 at the opf6 promoter (a transcription repressor gene) results in its significantly higher levels of gene expression (Figure 4A). This variation in gene expression may have an impact on the regulation of the GA pathway. For their role in flower development and flowering time [86], GA dynamic of the cells may contribute to the development of associated phenotypes (plant height, ear height and ear row number). Another pleiotropic rSNP in the bzip41 (Zm00001eb060810) promoter in this cluster results in loss of TFBSs for several SQUAMOSA-promoter binding protein-like TFs (SPL family TFs), whereas a TFBS is gained for C1 TF, resulting in significant change in gene expression (Figure 4B). C1 is known for regulating pigmentation in the aleurone layer of the maize kernels [87], whereas ERF112 is a potential negative regulator of the JA-responsive gene expression [88]. SPL proteins are a plant specific family of TFs and are known as potential candidates for the genetic improvement of agronomic traits due to their role in the physiological and reproductive development of plants [89]. For example, SPL4 is required for developmental transition and plays an important role in the determination of flowering time [90,91], whereas SPL12 is known to be expressed during plant development [92] and is responsive to abscisic acid biosynthesis in heat stress [93]. The association of this rSNP and recruitment of respective TFs is in line with the development of associated phenotypes. The pleiotropic rSNP (chr5:76811467) results in the loss of TFBSs for several ethylene responsive transcription factors (ERFs), which significantly increases the gene expression (Figure 4C). ERF TFs in this cluster mainly act as inhibitor of transcription [88,94], hence loss of their TFBS is translated as higher promoter activity on this gene, which may further contribute to the development of the associated phenotypes. Our results also show the importance of hierarchical network models in explaining the impact of pleiotropic rSNP on the expression of corresponding genes, which may directly influence all the associated phenotypes. TF−rSNPs(gene)−Phenotype network models for all other clusters are given in the Supplementary File S2.

Figure 4.

Figure 4

Hierarchical network model constructed using Cluster-7 to elucidate the complex interplay among TFs−rSNPs(genes)−Phenotypes. (AC) show the significant changes in the gene expression values resulting from the consequences of pleiotropic rSNPs. (D) Hierarchical network model with three layers.

4. Conclusions

Pleiotropy and rSNPs are the two main concepts in the field of genetics by providing novel targets for the acceleration of plant breeding strategies. Therefore we considered in this study both of these concepts together and established hierarchical network models for elucidating the complex interplay among TFs−rSNPs−Phenotypes using multi-omics data. Our results show that most of the identified TFs and genes play essential roles for the development of multiple phenotypes. Our findings further suggest common genetic mechanisms underlying several interrelated phenotypes found in Clusters-1 or -8 as well as disparate phenotypes, like plant height and kernel traits, found as in Clusters-2 or -5. To the best of our knowledge, by mainly focusing on the important role of rSNPs and their consequences on TFs, this is the first study which provides the pleiotropic relations of several agronomically important phenotypes of maize. The outcomes of our analysis could be highly relevant for the understanding of the genetic programs governing the development of multiple phenotypes. Therefore, further molecular biology progress is needed not only to assess the potential role of these rSNP candidates, but also to gain a deeper insight into the genetic mechanisms underlying biological processes in maize.

5. Future Directions

Unraveling the genetic architecture of complex traits is a key component for improving plants against biotic and biotic stresses. In this context, plant breeding strategies focus on developing QTL maps for different types of stresses, accounting for linkage disequilibrium and high false positive rate, standardizing genome-wide polygenetic scores [95,96], and incorporating epistatic effects into genome-wide association studies to explain missing heritability and pleiotropy using traditional marker-assisted selection (MAS) and genomic prediction (GS) strategies as well as ML approaches [97,98,99]. As the desired and undesired traits could share a pleiotropic relationship, consideration of pleiotropy is essential to direct breeding programs. Additionally consideration of pleiotropic signatures of rSNPs in the analysis of large genomic datasets with regarding MAS and GS, polygenetic scores, and polygenetic biotic interactions can impact the outcome of the analysis. We suggest that the incorporation of pleiotropic effects of rSNPs in the analysis of genomic data could improve outcome of MAS as well as GS studies.

Acknowledgments

We acknowledge support by the German Research Foundation and the Open Access Publication Funds of the Göttingen University. We would like to thank our collegues Abirami Rajavel and Felix Heinrich for providing helpful advice and discussions.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms23095121/s1.

Author Contributions

M.G. designed and supervised the research. A.H. together with M.G. participated in the design of the study. Further, A.H. conducted computational analyses, prepared the data sets, implemented the framework and performed the literature survey. S.K. involved in the rSNP analysis and interpreted the results with A.O.S., A.H. and M.G.; A.H. and M.G. wrote the final version of the manuscript. M.G. conceived of and managed the project. All authors read and approved the final manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Funding Statement

This research received no external funding.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Shiferaw B., Prasanna B.M., Hellin J., Bänziger M. Crops that feed the world 6. Past successes and future challenges to the role played by maize in global food security. Food Secur. 2011;3:307–327. doi: 10.1007/s12571-011-0140-5. [DOI] [Google Scholar]
  • 2.Prasanna B.M., Palacios-Rojas N., Hossain F., Muthusamy V., Menkir A., Dhliwayo T., Ndhlela T., San Vicente F., Nair S.K., Vivek B.S., et al. Molecular breeding for nutritionally enriched maize: Status and prospects. Front. Genet. 2020;10:1392. doi: 10.3389/fgene.2019.01392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ortiz-Monasterio J.I., Palacios-Rojas N., Meng E., Pixley K., Trethowan R., Pena R. Enhancing the mineral and vitamin content of wheat and maize through plant breeding. J. Cereal Sci. 2007;46:293–307. doi: 10.1016/j.jcs.2007.06.005. [DOI] [Google Scholar]
  • 4.Bänziger M., Betrán F., Lafitte H. Efficiency of high-nitrogen selection environments for improving maize for low-nitrogen target environments. Crop. Sci. 1997;37:1103–1109. doi: 10.2135/cropsci1997.0011183X003700040012x. [DOI] [Google Scholar]
  • 5.Suwarno W.B., Pixley K.V., Palacios-Rojas N., Kaeppler S.M., Babu R. Genome-wide association analysis reveals new targets for carotenoid biofortification in maize. Theor. Appl. Genet. 2015;128:851–864. doi: 10.1007/s00122-015-2475-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wu J., Lawit S.J., Weers B., Sun J., Mongar N., Van Hemert J., Melo R., Meng X., Rupe M., Clapp J., et al. Overexpression of zmm28 increases maize grain yield in the field. Proc. Natl. Acad. Sci. USA. 2019;116:23850–23858. doi: 10.1073/pnas.1902593116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Boćanski J., Srećkov Z., Nastasić A. Genetic and phenotypic relationship between grain yield and components of grain yield of maize (Zea mays L.) Genetika. 2009;41:145–154. doi: 10.2298/GENSR0902145B. [DOI] [Google Scholar]
  • 8.Veldboom L.R., Lee M. Genetic mapping of quantitative trait loci in maize in stress and nonstress environments: I. Grain yield and yield components. Crop. Sci. 1996;36:1310–1319. doi: 10.2135/cropsci1996.0011183X003600050040x. [DOI] [Google Scholar]
  • 9.Betran F., Beck D., Bänziger M., Edmeades G. Genetic analysis of inbred and hybrid grain yield under stress and nonstress environments in tropical maize. Crop. Sci. 2003;43:807–817. doi: 10.2135/cropsci2003.8070. [DOI] [Google Scholar]
  • 10.Dhugga K.S. Maize biomass yield and composition for biofuels. Crop. Sci. 2007;47:2211–2227. doi: 10.2135/cropsci2007.05.0299. [DOI] [Google Scholar]
  • 11.Fernandez M.G.S., Becraft P.W., Yin Y., Lübberstedt T. From dwarves to giants? Plant height manipulation for biomass yield. Trends Plant Sci. 2009;14:454–461. doi: 10.1016/j.tplants.2009.06.005. [DOI] [PubMed] [Google Scholar]
  • 12.Xue J., Gao S., Fan Y., Li L., Ming B., Wang K., Xie R., Hou P., Li S. Traits of plant morphology, stalk mechanical strength, and biomass accumulation in the selection of lodging-resistant maize cultivars. Eur. J. Agron. 2020;117:126073. doi: 10.1016/j.eja.2020.126073. [DOI] [Google Scholar]
  • 13.Mazaheri M., Heckwolf M., Vaillancourt B., Gage J.L., Burdo B., Heckwolf S., Barry K., Lipzen A., Ribeiro C.B., Kono T.J., et al. Genome-wide association analysis of stalk biomass and anatomical traits in maize. BMC Plant Biol. 2019;19:45. doi: 10.1186/s12870-019-1653-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Heinrich F., Wutke M., Das P.P., Kamp M., Gültas M., Link W., Schmitt A.O. Identification of regulatory SNPs associated with vicine and convicine content of Vicia faba based on genotyping by sequencing data using deep learning. Genes. 2020;11:614. doi: 10.3390/genes11060614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Pearson T.A., Manolio T.A. How to interpret a genome-wide association study. JAMA. 2008;299:1335–1344. doi: 10.1001/jama.299.11.1335. [DOI] [PubMed] [Google Scholar]
  • 16.Ramzan F., Gültas M., Bertram H., Cavero D., Schmitt A.O. Combining Random Forests and a Signal Detection Method Leads to the Robust Detection of Genotype-Phenotype Associations. Genes. 2020;11:892. doi: 10.3390/genes11080892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ramzan F., Klees S., Schmitt A.O., Cavero D., Gültas M. Identification of Age-Specific and Common Key Regulatory Mechanisms Governing Eggshell Strength in Chicken Using Random Forests. Genes. 2020;11:464. doi: 10.3390/genes11040464. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Visscher P.M., Wray N.R., Zhang Q., Sklar P., McCarthy M.I., Brown M.A., Yang J. 10 years of GWAS discovery: Biology, function, and translation. Am. J. Hum. Genet. 2017;101:5–22. doi: 10.1016/j.ajhg.2017.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Patron J., Serra-Cayuela A., Han B., Li C., Wishart D.S. Assessing the performance of genome-wide association studies for predicting disease risk. PLoS ONE. 2019;14:e0220215. doi: 10.1371/journal.pone.0220215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Klees S., Lange T.M., Bertram H., Rajavel A., Schlüter J.S., Lu K., Schmitt A.O., Gültas M. In Silico Identification of the Complex Interplay between Regulatory SNPs, Transcription Factors, and Their Related Genes in Brassica napus L. Using Multi-Omics Data. Int. J. Mol. Sci. 2021;22:789. doi: 10.3390/ijms22020789. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Liu Y., Wang D., He F., Wang J., Joshi T., Xu D. Phenotype prediction and genome-wide association study using deep convolutional neural network of soybean. Front. Genet. 2019;10:1091. doi: 10.3389/fgene.2019.01091. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Nguyen T.T., Huang J.Z., Wu Q., Nguyen T.T., Li M.J. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests. BMC Genom. 2015;16:S5. doi: 10.1186/1471-2164-16-S2-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Zhao Y., Chen F., Zhai R., Lin X., Wang Z., Su L., Christiani D.C. Correction for population stratification in random forest analysis. Int. J. Epidemiol. 2012;41:1798–1806. doi: 10.1093/ije/dys183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Libbrecht M.W., Noble W.S. Machine learning applications in genetics and genomics. Nat. Rev. Genet. 2015;16:321–332. doi: 10.1038/nrg3920. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Schrider D.R., Kern A.D. Supervised machine learning for population genetics: A new paradigm. Trends Genet. 2018;34:301–312. doi: 10.1016/j.tig.2017.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cortés A.J., López-Hernández F., Osorio-Rodriguez D. Predicting thermal adaptation by looking into populations’ genomic past. Front. Genet. 2020;11:1093. doi: 10.3389/fgene.2020.564515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Jansen S., Baulain U., Habig C., Ramzan F., Schauer J., Schmitt A.O., Scholz A.M., Sharifi A.R., Weigend A., Weigend S. Identification and Functional Annotation of Genes Related to Bone Stability in Laying Hens Using Random Forests. Genes. 2021;12:702. doi: 10.3390/genes12050702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Brieuc M.S., Waters C.D., Drinan D.P., Naish K.A. A practical introduction to Random Forest for genetic association studies in ecology and evolution. Mol. Ecol. Resour. 2018;18:755–766. doi: 10.1111/1755-0998.12773. [DOI] [PubMed] [Google Scholar]
  • 29.Pendergrass S.A., Brown-Gentry K., Dudek S., Frase A., Torstenson E.S., Goodloe R., Ambite J.L., Avery C.L., Buyske S., Bžková P., et al. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet. 2013;9:e1003087. doi: 10.1371/journal.pgen.1003087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Pendergrass S., Brown-Gentry K., Dudek S., Torstenson E., Ambite J., Avery C., Buyske S., Cai C., Fesinmeyer M., Haiman C., et al. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet. Epidemiol. 2011;35:410–422. doi: 10.1002/gepi.20589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Solovieff N., Cotsapas C., Lee P.H., Purcell S.M., Smoller J.W. Pleiotropy in complex traits: Challenges and strategies. Nat. Rev. Genets. 2013;14:483–495. doi: 10.1038/nrg3461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mayfield S., Nelson T., Taylor W., Malkin R. Carotenoid synthesis and pleiotropic effects in carotenoid-deficient seedlings of maize. Planta. 1986;169:23–32. doi: 10.1007/BF01369771. [DOI] [PubMed] [Google Scholar]
  • 33.Pilu R., Landoni M., Cassani E., Doria E., Nielsen E. The maize lpa241 mutation causes a remarkable variability of expression and some pleiotropic effects. Crop. Sci. 2005;45:2096–2105. doi: 10.2135/cropsci2004.0651. [DOI] [Google Scholar]
  • 34.Wen L., Chase C.D. Pleiotropic effects of a nuclear restorer-of-fertility locus on mitochondrial transcripts in male-fertile and S male-sterile maize. Curr. Genet. 1999;35:521–526. doi: 10.1007/s002940050448. [DOI] [PubMed] [Google Scholar]
  • 35.Bomblies K., Doebley J.F. Pleiotropic effects of the duplicate maize FLORICAULA/LEAFY genes zfl1 and zfl2 on traits under selection during maize domestication. Genetics. 2006;172:519–531. doi: 10.1534/genetics.105.048595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Asakura Y., Hirohashi T., Kikuchi S., Belcher S., Osborne E., Yano S., Terashima I., Barkan A., Nakai M. Maize mutants lacking chloroplast FtsY exhibit pleiotropic defects in the biogenesis of thylakoid membranes. Plant Cell. 2004;16:201–214. doi: 10.1105/tpc.014787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Chourey P.S., Li Q.B., Cevallos-Cevallos J. Pleiotropy and its dissection through a metabolic gene Miniature1 (Mn1) that encodes a cell wall invertase in developing seeds of maize. Plant Sci. 2012;184:45–53. doi: 10.1016/j.plantsci.2011.12.011. [DOI] [PubMed] [Google Scholar]
  • 38.Clark R.M., Wagler T.N., Quijada P., Doebley J. A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic effects on plant and inflorescent architecture. Nat. Genet. 2006;38:594–597. doi: 10.1038/ng1784. [DOI] [PubMed] [Google Scholar]
  • 39.Wisser R.J., Kolkman J.M., Patzoldt M.E., Holland J.B., Yu J., Krakowsky M., Nelson R.J., Balint-Kurti P.J. Multivariate analysis of maize disease resistances suggests a pleiotropic genetic basis and implicates a GST gene. Proc. Natl. Acad. Sci. USA. 2011;108:7339–7344. doi: 10.1073/pnas.1011739108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Brown P.J., Upadyayula N., Mahone G.S., Tian F., Bradbury P.J., Myles S., Holland J.B., Flint-Garcia S., McMullen M.D., Buckler E.S., et al. Distinct genetic architectures for male and female inflorescence traits of maize. PLoS Genet. 2011;7:e1002383. doi: 10.1371/journal.pgen.1002383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Houle D., Govindaraju D.R., Omholt S. Phenomics: The next challenge. Nat. Rev. Genet. 2010;11:855–866. doi: 10.1038/nrg2897. [DOI] [PubMed] [Google Scholar]
  • 42.Rajavel A., Klees S., Schlüter J.S., Bertram H., Lu K., Schmitt A.O., Gültas M. Unravelling the Complex Interplay of Transcription Factors Orchestrating Seed Oil Content in Brassica napus L. Int. J. Mol. Sci. 2021;22:1033. doi: 10.3390/ijms22031033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Liu H., Wang F., Xiao Y., Tian Z., Wen W., Zhang X., Chen X., Liu N., Li W., Liu L., et al. MODEM: Multi-omics data envelopment and mining in maize. Database. 2016;2016:baw117. doi: 10.1093/database/baw117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yang X., Gao S., Xu S., Zhang Z., Prasanna B.M., Li L., Li J., Yan J. Characterization of a global germplasm collection and its potential utilization for analysis of complex quantitative traits in maize. Mol. Breed. 2011;28:511–526. doi: 10.1007/s11032-010-9500-7. [DOI] [Google Scholar]
  • 45.Wen W., Araus J.L., Shah T., Cairns J., Mahuku G., Bänziger M., Torres J.L., Sánchez C., Yan J. Molecular characterization of a diverse maize inbred line collection and its potential utilization for stress tolerance improvement. Crop. Sci. 2011;51:2569–2581. doi: 10.2135/cropsci2010.08.0465. [DOI] [Google Scholar]
  • 46.Fu J., Cheng Y., Linghu J., Yang X., Kang L., Zhang Z., Zhang J., He C., Du X., Peng Z., et al. RNA sequencing reveals the complex regulatory network in the maize kernel. Nat. Commun. 2013;4:1–12. doi: 10.1038/ncomms3832. [DOI] [PubMed] [Google Scholar]
  • 47.Li H., Peng Z., Yang X., Wang W., Fu J., Wang J., Han Y., Chai Y., Guo T., Yang N., et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat. Genet. 2013;45:43–50. doi: 10.1038/ng.2484. [DOI] [PubMed] [Google Scholar]
  • 48.Wen W., Li D., Li X., Gao Y., Li W., Li H., Liu J., Liu H., Chen W., Luo J., et al. Metabolome-based genome-wide association study of maize kernel leads to novel biochemical insights. Nat. Commun. 2014;5:1–10. doi: 10.1038/ncomms4438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Yang N., Lu Y., Yang X., Huang J., Zhou Y., Ali F., Wen W., Liu J., Li J., Yan J. Genome Wide Association Studies Using a New Nonparametric Model Reveal the Genetic Architecture of 17 Agronomic Traits in an Enlarged Maize Association Panel. PLoS Genet. 2014;10:e1004573. doi: 10.1371/journal.pgen.1004573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Van Dongen S. Ph.D. Thesis. University of Utrecht; Utrecht, The Netherlands: 2000. Graph Clustering by Flow Simulation. [Google Scholar]
  • 51.Kel A.E., Gössling E., Reuter I., Cheremushkin E., Kel-Margoulis O.V., Wingender E. MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003;31:3576–3579. doi: 10.1093/nar/gkg585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Wingender E. The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation. Brief. Bioinform. 2008;9:326–332. doi: 10.1093/bib/bbn016. [DOI] [PubMed] [Google Scholar]
  • 53.Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 54.Kursa M.B., Rudnicki W.R. Feature selection with the Boruta package. J. Stat. Softw. 2010;36:1–13. doi: 10.18637/jss.v036.i11. [DOI] [Google Scholar]
  • 55.Li B.Q., Hu L.L., Chen L., Feng K.Y., Cai Y.D., Chou K.C. Prediction of Protein Domain with mRMR Feature Selection and Analysis. PLoS ONE. 2012;7:e39308. doi: 10.1371/journal.pone.0039308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Li B.Q., Feng K.Y., Chen L., Huang T., Cai Y.D. Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS. PLoS ONE. 2012;7:e43927. doi: 10.1371/journal.pone.0043927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Weighill D., Jones P., Bleker C., Ranjan P., Shah M., Zhao N., Martin M., DiFazio S., Macaya-Sanz D., Schmutz J., et al. Multi-phenotype association decomposition: Unraveling complex gene-phenotype relationships. Front. Genet. 2019;10:417. doi: 10.3389/fgene.2019.00417. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Ganal M.W., Durstewitz G., Polley A., Bérard A., Buckler E.S., Charcosset A., Clarke J.D., Graner E.M., Hansen M., Joets J., et al. A large maize (Zea mays L.) SNP genotyping array: Development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS ONE. 2011;6:e28334. doi: 10.1371/journal.pone.0028334. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Xu J., Chen G., Hermanson P.J., Xu Q., Sun C., Chen W., Kan Q., Li M., Crisp P.A., Yan J., et al. Population-level analysis reveals the widespread occurrence and phenotypic consequence of DNA methylation variation not tagged by genetic variation in maize. Genome Biol. 2019;20:1–16. doi: 10.1186/s13059-019-1859-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Zhao H., Sun Z., Wang J., Huang H., Kocher J.P., Wang L. CrossMap: A versatile tool for coordinate conversion between genome assemblies. Bioinformatics. 2014;30:1006–1007. doi: 10.1093/bioinformatics/btt730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Sun K. Ktrim: An extra-fast and accurate adapter-and quality-trimmer for sequencing data. Bioinformatics. 2020;36:3561–3562. doi: 10.1093/bioinformatics/btaa171. [DOI] [PubMed] [Google Scholar]
  • 62.Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Putri G.H., Anders S., Pyl P.T., Pimanda J.E., Zanini F. Analysing high-throughput sequencing data in Python with HTSeq 2.0. arXiv. 2021 doi: 10.1093/bioinformatics/btac166.2112.00939 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:1–21. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Klees S., Heinrich F., Schmitt A.O., Gültas M. agReg-SNPdb: A Database of Regulatory SNPs for Agricultural Animal Species. Biology. 2021;10:790. doi: 10.3390/biology10080790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Bloom S.A. Similarity indices in community studies: Potential pitfalls. Mar. Ecol. Prog. Ser. 1981;5:125–128. doi: 10.3354/meps005125. [DOI] [Google Scholar]
  • 67.Conway J.R., Lex A., Gehlenborg N. UpSetR: An R package for the visualization of intersecting sets and their properties. Bioinformatics. 2017;33:2938–2940. doi: 10.1093/bioinformatics/btx364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.De Lucia F., Crevillen P., Jones A.M., Greb T., Dean C. A PHD-polycomb repressive complex 2 triggers the epigenetic silencing of FLC during vernalization. Proc. Natl. Acad. Sci. USA. 2008;105:16831–16836. doi: 10.1073/pnas.0808687105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Mylne J., Greb T., Lister C., Dean C. Proceedings of the Cold Spring Harbor Symposia on Quantitative Biology. Volume 69. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, NY, USA: 2004. Epigenetic regulation in the control of flowering; pp. 457–464. [DOI] [PubMed] [Google Scholar]
  • 70.Berardini T.Z., Reiser L., Li D., Mezheritsky Y., Muller R., Strait E., Huala E. The Arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis. 2015;53:474–485. doi: 10.1002/dvg.22877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kim D.H., Sung S. Role of VIN3-LIKE 2 in facultative photoperiodic flowering response in Arabidopsis. Plant Signal. Behav. 2010;5:1672–1673. doi: 10.4161/psb.5.12.14035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Qi H., Jiang Z., Zhang K., Yang S., He F., Zhang Z. PlaD: A transcriptomics database for plant defense responses to pathogens, providing new insights into plant immune system. Genom. Proteom. Bioinform. 2018;16:283–293. doi: 10.1016/j.gpb.2018.08.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Stein O., Avin-Wittenberg T., Krahnert I., Zemach H., Bogol V., Daron O., Aloni R., Fernie A.R., Granot D. Corrigendum: Arabidopsis fructokinases are important for seed oil accumulation and vascular development. Front. Plant Sci. 2017;8:303. doi: 10.3389/fpls.2017.00303. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Jiao Y., Peluso P., Shi J., Liang T., Stitzer M.C., Wang B., Campbell M.S., Stein J.C., Wei X., Chin C.S., et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–527. doi: 10.1038/nature22971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Baudisch B., Klösgen R.B. Dual targeting of a processing peptidase into both endosymbiotic organelles mediated by a transport signal of unusual architecture. Mol. Plant. 2012;5:494–503. doi: 10.1093/mp/ssr092. [DOI] [PubMed] [Google Scholar]
  • 76.Fu X., Guan X., Garlock R., Nikolau B.J. Mitochondrial Fatty Acid Synthase Utilizes Multiple Acyl Carrier Protein Isoforms1[OPEN] Plant Physiol. 2020;183:547–557. doi: 10.1104/pp.19.01468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Li N., Gügel I.L., Giavalisco P., Zeisler V., Schreiber L., Soll J., Philippar K. FAX1, a novel membrane protein mediating plastid fatty acid export. PLoS Biol. 2015;13:e1002053. doi: 10.1371/journal.pbio.1002053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Kim S.J., Chandrasekar B., Rea A.C., Danhof L., Zemelis-Durfee S., Thrower N., Shepard Z.S., Pauly M., Brandizzi F., Keegstra K. The synthesis of xyloglucan, an abundant plant cell wall polysaccharide, requires CSLC function. Proc. Natl. Acad. Sci. USA. 2020;117:20316–20324. doi: 10.1073/pnas.2007245117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Seebauer J.R., Moose S.P., Fabbri B.J., Crossland L.D., Below F.E. Amino acid metabolism in maize earshoots. Implications for assimilate preconditioning and nitrogen signaling. Plant Physiol. 2004;136:4326–4334. doi: 10.1104/pp.104.043778. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Gocal G.F., Sheldon C.C., Gubler F., Moritz T., Bagnall D.J., MacMillan C.P., Li S.F., Parish R.W., Dennis E.S., Weigel D., et al. GAMYB-like genes, flowering, and gibberellin signaling in Arabidopsis. Plant Physiol. 2001;127:1682–1693. doi: 10.1104/pp.010442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Woodger F.J., Millar A., Murray F., Jacobsen J.V., Gubler F. The role of GAMYB transcription factors in GA-regulated gene expression. J. Plant Growth Regul. 2003;22:176–184. doi: 10.1007/s00344-003-0025-8. [DOI] [Google Scholar]
  • 82.Fang Y., Xie K., Hou X., Hu H., Xiong L. Systematic analysis of GT factor family of rice reveals a novel subfamily involved in stress responses. Mol. Genet. Genom. 2010;283:157–169. doi: 10.1007/s00438-009-0507-x. [DOI] [PubMed] [Google Scholar]
  • 83.Hiratsuka K., Wu X., Fukuzawa H., Chua N.H. Molecular dissection of GT-1 from Arabidopsis. Plant Cell. 1994;6:1805–1813. doi: 10.1105/tpc.6.12.1805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Green P.J., Yong M.H., Cuozzo M., Kano-Murakami Y., Silverstein P., Chua N. Binding site requirements for pea nuclear protein factor GT-1 correlate with sequences required for light-dependent transcriptional activation of the rbcS-3A gene. EMBO J. 1988;7:4035–4044. doi: 10.1002/j.1460-2075.1988.tb03297.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Le Gourrierec J., Delaporte V., Ayadi M., Li Y.F., Zhou D.X. Functional analysis of Arabidopsis transcription factor GT-1 in the expression of light-regulated genes. Genome Lett. 2002;1:77–82. doi: 10.1166/gl.2002.009. [DOI] [Google Scholar]
  • 86.Cheng H., Qin L., Lee S., Fu X., Richards D.E., Cao D., Luo D., Harberd N.P., Peng J. Gibberellin regulates Arabidopsis floral development via suppression of DELLA protein function. Development. 2004;131:1055–1064. doi: 10.1242/dev.00992. [DOI] [PubMed] [Google Scholar]
  • 87.Cone K.C., Cocciolone S.M., Burr F.A., Burr B. Maize anthocyanin regulatory gene pl is a duplicate of c1 that functions in the plant. Plant Cell. 1993;5:1795–1805. doi: 10.1105/tpc.5.12.1795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Caarls L., Van der Does D., Hickman R., Jansen W., Verk M.C.V., Proietti S., Lorenzo O., Solano R., Pieterse C.M., Van Wees S. Assessing the role of ETHYLENE RESPONSE FACTOR transcriptional repressors in salicylic acid-mediated suppression of jasmonic acid-responsive genes. Plant Cell Physiol. 2017;58:266–278. doi: 10.1093/pcp/pcw187. [DOI] [PubMed] [Google Scholar]
  • 89.Yu N., Yang J.C., Yin G.T., Li R.S., Zou W.T. Genome-wide characterization of the SPL gene family involved in the age development of Jatropha curcas. BMC Genom. 2020;21:68. doi: 10.1186/s12864-020-06776-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Jung J.H., Seo P.J., Kang S.K., Park C.M. miR172 signals are incorporated into the miR156 signaling pathway at the SPL3/4/5 genes in Arabidopsis developmental transitions. Plant Mol. Biol. 2011;76:35–45. doi: 10.1007/s11103-011-9759-z. [DOI] [PubMed] [Google Scholar]
  • 91.Jung J.H., Lee H.J., Ryu J.Y., Park C.M. SPL3/4/5 integrate developmental aging and photoperiodic signals into the FT-FD module in Arabidopsis flowering. Mol. Plant. 2016;9:1647–1659. doi: 10.1016/j.molp.2016.10.014. [DOI] [PubMed] [Google Scholar]
  • 92.Cardon G., Höhmann S., Klein J., Nettesheim K., Saedler H., Huijser P. Molecular characterisation of the Arabidopsis SBP-box genes. Gene. 1999;237:91–104. doi: 10.1016/S0378-1119(99)00308-X. [DOI] [PubMed] [Google Scholar]
  • 93.Chao L.M., Liu Y.Q., Chen D.Y., Xue X.Y., Mao Y.B., Chen X.Y. Arabidopsis transcription factors SPL1 and SPL12 confer plant thermotolerance at reproductive stage. Mol. Plant. 2017;10:735–748. doi: 10.1016/j.molp.2017.03.010. [DOI] [PubMed] [Google Scholar]
  • 94.Ohta M., Matsui K., Hiratsu K., Shinshi H., Ohme-Takagi M. Repression domains of class II ERF transcriptional repressors share an essential motif for active repression. Plant Cell. 2001;13:1959–1968. doi: 10.1105/TPC.010127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Cortés A.J., López-Hernández F. Harnessing crop wild diversity for climate change adaptation. Genes. 2021;12:783. doi: 10.3390/genes12050783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Guevara-Escudero M., Osorio A.N., Cortés A.J. Integrative pre-breeding for biotic resistance in forest trees. Plants. 2021;10:2022. doi: 10.3390/plants10102022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ma C., Zhang H.H., Wang X. Machine learning for big data analytics in plants. Trends Plant Sci. 2014;19:798–808. doi: 10.1016/j.tplants.2014.08.004. [DOI] [PubMed] [Google Scholar]
  • 98.Cortés A.J., Restrepo-Montoya M., Bedoya-Canas L.E. Modern strategies to assess and breed forest tree adaptation to changing climate. Front. Plant Sci. 2020;11:1606. doi: 10.3389/fpls.2020.583323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Tong H., Nikoloski Z. Machine learning approaches for crop improvement: Leveraging phenotypic and genotypic big data. J. Plant Physiol. 2021;257:153354. doi: 10.1016/j.jplph.2020.153354. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from International Journal of Molecular Sciences are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES