Abstract
Background
Classical genetic concepts to explain heterosis attribute the superiority of F1-hybrids over their homozygous parents to the complementation of unfavorable by beneficial alleles (dominance) or to heterozygote advantage (overdominance). Here we analyze 112 intermated B73xMo17 recombinant inbred lines of maize and their backcrosses to their original parents B73 and Mo17 to obtain hybrids with an average heterozygosity of ~ 50%. This genetic architecture allows studying the influence of homozygous and heterozygous genomic regions on gene expression in hybrids.
Results
We demonstrate that single parent expression (SPE) complementation explains between − 8% and 29% of the mid-parent heterotic variance in these hybrids. In this expression pattern, consistent with dominance, genes are active in only one parent and in the hybrid, thus increasing the number of expressed genes in hybrids. Furthermore, we establish that eQTL regulating SPE genes are predominantly located in heterozygous regions of the genome. Finally, we identify an SPE gene that regulates lateral root density in hybrids. Remarkably, the activity of this gene depends on the presence of a Mo17 allele in an eQTL that regulates this gene.
Conclusions
Here we show that dominance of SPE genes influences the number of active genes in hybrids, while heterozygosity is instrumental for the regulation of these genes. This finding supports the notion that the genetic constitution of distant regulatory elements is instrumental for the activity of heterosis-associated genes. In summary, our results connect genetic variation at regulatory loci and the degree of heterozygosity with phenotypic variation of heterosis via SPE complementation.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-025-03768-3.
Background
Heterozygous F1 hybrids are more vigorous and show higher fitness and biomass production compared to their homozygous parental inbred lines [1], a phenomenon called heterosis [2]. The introduction of hybrids in plant breeding was one of the landmark innovations of agriculture. Among crops, outcrossing species such as maize display the highest degree of heterosis [3].
Heterosis is typically measured for above-ground phenotypic traits related to yield and biomass [4], but seedling root traits also display significant heterosis [5] important for early seedling vigor [6]. The degree of phenotypic heterosis can vary substantially between different traits of a plant [7].
Two classical concepts to explain heterosis have already been introduced in the early twentieth century [1]. The overdominance model suggests that the interaction of heterozygous alleles in hybrids results in better performing plants than the interaction of homozygous alleles in the parents [8, 9]. Hence, in this model, heterozygosity is required for heterosis. In the dominance model, it is assumed that many slightly deleterious alleles are complemented in the hybrid by dominant or beneficial alleles [10]. Although the dominance model was formulated before the discovery of DNA, this model refers to allelic differences on the level of genomic DNA. Dominance is also observed on the level of gene activity in a pattern designated single parent expression (SPE) complementation [11]. This expression pattern is observed for hundreds of genes that are present in both parental inbred lines but are active in only one parent and in the hybrid, thus resulting in expression complementation of the silent allele in the hybrid. SPE complementation introduces a tissue-specific component to the classic concept of dominance consistent with the tissue-specific phenotypic variation of heterosis for different traits of a single plant [12] It was further established that SPE genes are significantly enriched among evolutionarily younger, non-syntenic genes and might function in the adaptation of hybrids to different environments [13–15]. Additionally, variation in transcriptional regulation of cis- and trans-acting factors has been highlighted in relation to hybrid performance [16]. Trans-regulated gene expression in hybrids was associated with paternal alleles in maize [17]. Moreover, an association between SPE and cis-regulation was suggested [15].
Intermated recombinant inbred lines (I-RILs) are the result of crossing two distinct inbred lines, followed by several generations of intercrossing and subsequent self-pollination of their progeny [18]. In maize, the intermated B73 × Mo17 recombinant inbred line (IBM-RIL) syn. 4 population, a set of ~ 300 highly homozygous intercrossed RILs (4 generations of intercrossing), shows a high diversity of phenotypes and of genomic regions contributed by the two parental genotypes [19]. By backcrossing RILs to their original parents, backcross populations can be generated which show varying degrees of heterozygosity and heterosis. Genetically and phenotypically diverse RILs and backcross populations are an important resource for QTL mapping, as well as for candidate gene identification and heterosis studies [20–23].
We used the IBM-RIL population and two backcross populations to study how varying heterozygosity and the regulation of SPE complementation influences the manifestation of heterosis in seedling root development. We demonstrated that SPE genes are predominantly regulated from heterozygous regions of the genome and that depending on the genetic constitution of the active parent, cis- or trans- regulation of SPE is prevalent. We hypothesize based on our findings that differences in regulation of the parental lines of a hybrid contribute to heterosis.
Results
Transcriptome profiling and sample evaluation via SNP calling reveals cross-over locations and IBM-RIL specific regions
After seed propagation, we selected 112 IBM-RILs and their B73 × IBM-RIL and Mo17 × IBM-RIL hybrids (Fig. 1A, Additional file 1: Fig S1) based on seed availability together with the reciprocal hybrids Mo17 × B73 and B73 × Mo17 as well as the parents B73 and Mo17 (Fig. 1B) for subsequent phenotyping and RNA-sequencing of young primary roots. After SNP calling and quality control (see methods), we classified the IBM-RIL genomes into B73 and Mo17 regions. During this process, we masked IBM-RIL specific regions which contained SNPs not present in Mo17 or B73 and are thus Likely the result of contamination from other genotypes. In total, 834 out of 1152 sequenced samples passed all quality thresholds and were subjected to further analyses. They consist of 2–3 biological replicates of 85 B73 × IBM-RIL and 82 Mo17 × IBM-RIL backcross hybrids and their IBM-RIL parents together with 23 (B73 × Mo17) and 24 (Mo17 × B73) biological replicates of the full reference hybrids and 47 biological replicates of B73 and 42 of Mo17 (Additional file 2: Table S1). The categorization of the IBM-RIL genomes revealed the putative recombination breakpoints between Mo17 and B73 regions (Additional file 1: Fig. S1). In summary, we used the SNP data to identify B73 and Mo17 regions in the genomes of the IBM-RILs, we compared those regions with the homozygous and heterozygous regions of the corresponding B73xIBM-RIL and Mo17xIBM-RIL backcross hybrids and filtered those samples, where the regions did not match (Additional file 3: Supplement Material).
Fig. 1.
Plant genetic material and sample information. A Schematic depiction of the 112 IBM-RILs and their B73- and Mo17-backcross hybrids. B Schematic depiction of the reference genotypes. C Multidimensional Scaling Plot of 834 high-quality RNA-seq samples. Each genotype group is highlighted by a different color. The reference genotypes B73, Mo17, B73xMo17, and Mo17xB73 are depicted as triangles, IBM-RILs and their backcrosses as circles. The leading Log2FC between two samples can be interpreted as expression difference between those two samples
We explored the sample quality and the relationship among the 834 RNA-seq samples in a multidimensional scaling (MDS) plot (Fig. 1C). The different genotypes (B73, Mo17, B73 × Mo17, Mo17 × B73) and populations (IBM-RILs, B73 × IBM-RILs, Mo17 × IBM-RILs) form distinct clusters, which are clearly separated from each other. As expected, the clusters of the two reciprocal reference hybrids B73 × Mo17 and Mo17 × B73 overlap because their nuclear genomes are identical. These hybrid samples are located in between their parental inbred Lines on dimension 1 of the MDS plot (Fig. 1C). Similarly, the IBM-RILs are located between their original parental inbred lines B73 and Mo17. Finally, as expected the two backcross populations B73 × IBM-RIL and Mo17 × IBM-RIL are located between their parental lines (Fig. 1C). The clear separation and distance between groups of samples indicates a high quality of samples after filtering.
Root traits display heterosis in backcross populations
For each parent-hybrid combination, we determined lateral root density (Additional file 1: Fig. S2A); total number of root tips (Additional file 1: Fig. S2B); total root length (Additional file 1: Fig. S2C); and total root volume (Additional file 1: Fig. S2D). The inbred line B73 outperformed the inbred line Mo17 in all measured root traits. The B73 × Mo17 hybrid displayed higher values than the Mo17 × B73 hybrid for the total number of root tips, total root length, and total root volume. In contrast, Mo17 × B73 displayed a higher lateral root density than B73 × Mo17 (Additional file 1: Fig S2).
In addition, we estimated mid-parent heterosis (MPH; Additional file 1: Fig. S2) and better-parent heterosis (BPH; Additional file 1: Fig. S3) for all measured root traits. The level of MPH of the fully heterozygous B73 × Mo17 and Mo17 × B73 hybrids ranged from 49% for lateral root density to 127% for the number of root tips and thus exceeded the average of the partially heterozygous B73 × IBM-RIL and Mo17 × IBM-RIL hybrids (from 25% for lateral root density to 62% for total root volume) for all traits. Nevertheless, most of the IBM-RIL backcross hybrids displayed substantial MPH (Additional file 1: Fig. S2, Additional file 2: Table S2). BPH (measured as % of the better-parent value) of the fully heterozygous B73 × Mo17 and Mo17 × B73 hybrids ranged from 24% for the number of root tips to 87% for the total root volume and also. The fully heterozygous hybrids exceeded the average of the partially heterozygous B73 × IBM-RIL and Mo17 × IBM-RIL hybrids, which ranged from 15% for lateral root density to 44% for total root volume for all traits except the number of root tips (Additional file 1: Fig. S3, Additional file 2: Table S2). As expected, most backcross hybrids show heterosis but with the fully heterozygous hybrids exceeding the heterosis values for most backcross-hybrids in early seedling root traits.
Single parent expression complementation is mainly observed in heterozygous genomic regions
Genes which are active in one parental inbred line and in the F1-hybrid offspring, but inactive in the second parent, display single-parent expression (SPE). In the highly heterozygous reference hybrid B73 × Mo17 we identified 1297 SPE B73/Mo17 genes, active in the hybrid and the maternal inbred line B73 (pattern 1, Fig. 2A; allele contributed by active parent in bold). In the reciprocal hybrid Mo17 × B73 we identified 1241 SPE Mo17/B73 genes active in the hybrid and only in the maternal inbred line Mo17 (pattern 5, Fig. 2B). In addition, we identified 1228 SPE B73/Mo17 (pattern 2) and 1253 SPE Mo17/B73 (pattern 6) genes active in the paternal inbred Line and the hybrid, but not the corresponding maternal parent. Between both reference hybrids, 85% of the SPE genes were conserved (Additional file 1: Table S3).
Fig. 2.
Single-parent expression (SPE) complementation. A SPE pattern present in B73xIBM-RIL backcross hybrids relative to the genomic composition of the paternal IBM-RIL and the active parent (patterns 1–4). B SPE pattern in Mo17xIBM-RIL backcross hybrids relative to the genomic composition of the paternal IBM-RIL and the active parent (5–8). A simplified depiction of the activity (on/off) of SPE genes is shown for each SPE pattern above the genotypic composition of the respective pattern for each parent and hybrid. Patterns 1, 2 and 5, 6 are also present in B73xMo17 and Mo17xB73 respectively. C Boxplots displaying the total number of SPE genes for each of the possible SPE pattern (1–4) in the B73xIBM-RIL hybrids (N = 85). D Boxplots displaying the total number of SPE genes for each of the possible SPE pattern (5–8) in the Mo17xIBM-RIL hybrids (N = 82). Asterisks indicate significant differences (α < 0.0001, p-values given as < 0.0001 in other words, zero) identified by a Gaussian mixed model with the hybrid as random effect, the SPE pattern and non-SPE pattern as a fixed factor and a diagonal variance component for the SPE pattern. E For each IBM-RIL line (red) and the corresponding B73xIBM-RIL (blue) and Mo17xIBM-RIL (yellow) backcross hybrids the number of active genes is displayed. The dashed lines represent the number of active genes in the inbred lines B73 (blue) and Mo17 (yellow). The dotted lines represent the number of active genes in the reciprocal hybrids B73xMo17 (blue) and Mo17xB73 (yellow).and Mo17xB73 (yellow)
Within the two IBM-RIL backcross populations we identified eight different SPE patterns, four in each backcross population, depending on the genomic composition of the gene in the paternal IBM-RIL. SPE genes of all patterns are expressed in the hybrid. But we can distinguish them by the active parent (Fig. 2A, B; allele contributed by active parent in bold). Pattern 1 (B73 × IBM-RILs): Genes are located in heterozygous regions of the hybrid and the expressed parent is B73; Pattern 2 (B73 × IBM-RILs): Genes are located in heterozygous regions of the hybrid and the expressed parent is the IBM-RIL, which contributes the Mo17 allele for these genes; Pattern 3 (B73 × IBM-RILs): Genes are located in homozygous regions of the hybrid and the expressed parent is B73; Pattern 4 (B73 × IBM-RILs): Genes are located in homozygous regions of the hybrid and the expressed parent is the IBM-RIL, which contributes the B73 allele for these genes;
Pattern 5 (Mo17 × IBM-RILs): Genes are located in heterozygous regions of the hybrid and the expressed parent is Mo17; pattern 6 (Mo17 × IBM-RILs): Genes are located in heterozygous regions of the hybrid and the expressed parent is the IBM-RIL, which contributes the B73 allele for these genes; pattern 7 (Mo17 × IBM-RILs): Genes are located in homozygous regions of the hybrid and the expressed parent is Mo17; and pattern 8 (Mo17 × IBM-RILs): Genes are located in homozygous regions of the hybrid and the expressed parent is the IBM-RIL, which contributes the Mo17 allele for these genes (Fig. 2A, B). We chose four examplary genes, illustrating the expression patterns 3, 4, 7, and 8 by showing their raw read counts in the parent-hybrid combinations (Additional file 1: Fig. S4).
We observed in the B73 × IBM-RIL and Mo17 × IBM-RIL backcrosses on average 1229 (Fig. 2C) and 1247 (Fig. 2D) SPE genes per hybrid, which is ~ 50% of the number of SPE genes in B73 × Mo17 and Mo17 × B73. Across B73 × IBM-RIL and Mo17 × IBM-RIL backcross hybrids, SPE patterns in heterozygous regions (Fig. 2C, D: patterns 1, 2, 5, 6) occurred more often than in homozygous regions (Fig. 2C, D: patterns 3, 4, 7, 8).
We identified more SPE genes in heterozygous regions with B73 as the active parent (Fig. 2C, D: patterns 1 and 6) than with Mo17 as the active parent (Fig. 2C, D: patterns 2 and 5). In regions homozygous for B73, more genes with paternal activity were observed (Fig. 2C: patterns 4 vs 3), whereas in regions homozygous for Mo17 more genes with maternal activity were determined (Fig. 2D: pattern 7 vs 8).
As a consequence of expression complementation of genes expressed in only one parent, which are then also active in the hybrid (SPE), we observed that hybrids express more genes than their parental lines (Fig. 2E). While the inbred lines B73 and Mo17 express 22 658 and 22 635 genes, respectively, their reciprocal F1 hybrid offspring B73 × Mo17 and Mo17 × B73 display 736 and 692 more active genes than their parental average. The corresponding B73 × IBM-RIL and Mo17 × IBM-RIL backcrosses expressed on average 278 and 253 more genes than their parents. Furthermore, the number of active genes in the backcross hybrids is positively correlated with the fraction of heterozygous genomic regions in the hybrid (Additional file 1: Fig. S5). These results indicate an association between heterozygosity and expression complementation in hybrids.
Contribution of SPE to heterosis
To estimate the proportion of heterosis variance explained by the number of SPE genes underlying MPH and BPH of early seedling root traits, we determined in each backcross population the coefficient of determination pHET. Across the four examined root traits, 12% (total root volume) to 29% (total number of root tips) of the heterotic variance underlying MPH was explained by the number of SPE genes across the four different patterns in the B73 × IBM-RIL backcross hybrids. In contrast, in the Mo17 × IBM-RILs only between 4% (lateral root density) and 9% (total root volume) of the heterotic variance was explained by the total number of SPE genes. For the total number of root tips and total root length in the Mo17 × IBM-RILs, we observed a negative proportion (− 8% and − 0.5%) of explained heterotic variance of the number of SPE genes on MPH (Table 1). We obtained similar results for the contribution of SPE genes to the variance of BPH (better parent heterosis; Additional file 4: Table S4). While the proportion of better parent heterosis variance explained by SPE in B73xIBM-RILs ranged from 14 to 29%, it ranged from − 4% to 15% in Mo17xIBM-RILs (Additional file 4: Table S4). Negative proportions in these analyses occur, when the null model (see methods model 3) predicting heterosis without information on the number of SPE genes outperforms the full model (see methods model 2), which includes the numbers of SPE genes. Therefore, negative values for pHET do not indicate a negative influence of SPE on heterosis. Instead this indicates that other expression patterns such as non-additive expression [24] might contribute more to heterosis in the Mo17xIBM-RILs than SPE. However, our results confirm a substantial contribution of SPE genes to seedling root heterosis in B73xIBM-RILs.
Table 1.
Proportion of heterotic variance explained by the number of SPE genes on mid-parent heterosis for different root phenotypes
Trait | B73xIBM-RILs | Mo17xIBM-RILs | ||||
---|---|---|---|---|---|---|
σ2Het | σ2G | pHet | σ2Het | σ2G | pHet | |
No. of root tips | 0.232 | 0.325 | 0.29 (29%) | 0.205 | 0.190 | − 0.08 (− 8%) |
Total root volume | 2.366 | 2.701 | 0.12 (12%) | 1.022 | 1.121 | 0.09 (9%) |
Total root length | 2.715 | 3.561 | 0.24 (24%) | 2.202 | 2.191 | − 0.01 (− 1%) |
Lateral root density | 5.461 | 6.805 | 0.20 (20%) | 6.675 | 6.925 | 0.04 (4%) |
σ2Het, unexplained genetic variance of heterosis effect, not associated with SPE genes; σ2G, total genetic variance among the hybrid genotypes; pHet, coefficient of determination: proportion of the heterotic variance explained by the number of SPE genes
Trans-regulatory elements are more frequent in Mo17 than B73
An expression quantitative trait locus (eQTL) is a position in the genome that is significantly associated with expression variation of a gene and thus Likely regulates this gene. We identified 13,778 eQTL for 13,434 protein-coding genes. While most genes were regulated by a single eQTL, 334 genes were regulated by two eQTL and five by three eQTL (Additional file 2: Table S5). We categorized all eQTL into either cis- or trans-regulating by their location relative to the starting position of the corresponding gene. We defined cis-regulating eQTL as located at a distance of < 2.5 Mbp from the start codon of their target gene on the same chromosome (see methods for details). In contrast, most trans-regulating eQTL were located on chromosomes other than those of their target gene (83%). The remaining trans eQTL were located on the same chromosome as their target at a distance of ≥ 2.5 Mbp (and the target gene was outside of the eQTL confidence interval). The logarithm of odds (LOD) is the significance measure of an eQTL being present using the Haley-Knott regression. The median LOD was 15.5 for trans- and 16.3 for cis-eQTL (Additional file 2: Table S5). Effect sizes usually estimate how big the effect of each identified locus is on the phenotype, or in this case on gene expression. As LOD values and effect size estimates of the eQTL on the gene expression are correlated [25], this indicates similar effect sizes of cis- and trans-acting eQTL. Nevertheless, we cannot exclude the possibility that small effect eQTL, especially those acting in trans, are present but were not detected due to the rather small population size or were not significant in our eQTL analysis. At the same time, having mostly large-effects in both cis- and trans-eQTL allows us to directly compare their distributions among the SPE genes. In total, 88% (12,057/13,778) of all identified eQTL acted in cis, while 12% (1701/13,778) acted in trans on their target genes. For genes active in the B73 reference genotype, 7% (771) of their eQTL were acting in trans, while for genes active in the Mo17 reference genotype, almost twice as many eQTL acting in trans were identified (13%, 1439). The values for all other genotypes show trans-acting eQTL ratios between the B73 and Mo17 values (Table 2). This suggests that Mo17 contains more trans-acting eQTL than B73.
Table 2.
Trans eQTL for active genes
Genotype | Trans-regulating eQTL [in % of eQTL for active genes] | Number of trans-regulated genes |
---|---|---|
B73 | 7 | 771 |
B73_Mo17 | 11 | 1365 |
Mo17 | 13 | 1439 |
Mo17_B73 | 11 | 1363 |
IBM-RILs* | 10 | 1107 |
B73xIBM-RILs* | 9 | 1066 |
Mo17xIBM-RILs* | 12 | 1385 |
*Values are the mean of the respective category
Heterozygous SPE genes with Mo17 activity are trans-regulated disproportionately more often
In both fully heterozygous reference hybrids B73 × Mo17 and Mo17 × B73, we detected eQTL for 85% of the SPE genes (Additional file 2: Table S3). Overall, 95% of eQTL for SPE genes with B73 contributing the active parental allele were cis-regulating (Additional file 1: Fig. S6). By contrast, SPE genes with Mo17 as the active allele were predominantly (58–59%) trans-regulated (Additional file 1: Fig. S6).
Similarly, in B73 × IBM-RIL and Mo17 × IBM-RIL backcross hybrids (Additional file 1: Fig. S7), heterozygous SPE genes where B73 is the active allele are regulated almost exclusively by cis eQTL (on average 95% and 96%; Additional file 1: Fig. S7A: pattern 1; Additional file 1: Fig. S7B: pattern 6). By contrast, on average, only 60% (Additional file 1: Fig. S7A pattern 2) and 62% (Additional file 1: Fig. S7B pattern 5) of heterozygous SPE genes with an active Mo17 allele is cis-regulated. Hence, in general heterozygous SPE genes with an active Mo17 allele are significantly more frequently trans-regulated than those with B73 as the active parental allele (Figure S7).
Homozygous SPE genes are partially trans-regulated
SPE genes located in homozygous regions showed different ratios of cis- to trans-eQTL regulation, based on the active parent. While in homozygous B73 regions (occurring in B73 × IBM-RILs), SPE with maternally active alleles (B73/B73, Additional file 1: Fig. S7A, pattern 3) were primarily cis-regulated (87%), those with paternally active alleles (B73/B73, Additional file 1: Fig. S7A pattern 4) were primarily trans-regulated (67%). In homozygous Mo17 regions (Mo17 × IBM-RILs), we saw the opposite. SPE genes with maternally active alleles (Mo17/Mo17, Additional file 1: Fig. S7B pattern 7) showed more trans-regulating eQTL (58%) and for SPE with paternal activity (Mo17/Mo17, Additional file 1: Fig. S7B pattern 8) the majority of eQTL (81%) were cis-regulating. Interestingly, the SPE patterns 4 and 7, which are primarily regulated by trans-eQTL, also had higher absolute numbers of SPE genes (Fig. 2A, B). The observed proportions of trans-regulation for SPE patterns 1 to 8 were significantly different from the ratios in non-SPE genes, of which 91% were cis-regulated (Additional file 1: Fig. S7). Likely, there is an association between the regulatory regime (cis/trans) and the genetic constitution of the SPE gene.
SPE genes are predominantly regulated in heterozygous regions
Overall, we observed that for heterozygous SPE genes, both cis- and trans-regulating eQTL are predominantly located in heterozygous genomic regions (Fig. 3). We observed that cis-regulating eQTL located in hetero- and homozygous regions regulate SPE genes which are also hetero- or homozygous as the corresponding eQTL. This is expected, as cis-eQTL are located in close proximity to their gene (Fig. 3A, B). In contrast, target genes of trans-regulating eQTL are randomly distributed across the genome. Accordingly, trans-acting eQTL regulate SPEs in both homozygous and heterozygous regions with equal frequency, independent of whether the eQTL is homo- or heterozygous (Fig. 3A, B). Together with the association of Mo17 activity with trans-regulation, this leads to the observed ratios of cis- to trans-regulation in homozygous SPE patterns. For instance, eQTL regulating homozygous genes in trans (Fig. 3A and B, patterns 4 and 7) were located almost exclusively in heterozygous regions. In both cases, the active parent carries the Mo17 allele at the eQTL position which is responsible for the gene activity of those SPE genes (Additional file 1: Fig. S8 A, B). This further explains the higher proportion of trans-regulation of patterns 4 and 7 as well as the generally higher number of SPE genes for these patterns compared to patterns 3 and 8. In summary, SPE patterns with parental B73 activity showed a slightly higher proportion of regulation by cis-acting eQTL compared with non-SPE genes, while the SPE patterns with Mo17 activity showed substantially lower regulation by cis-acting eQTL and thus higher trans-regulation (Fig. 3, Additional file 1: Fig. S7).
Fig. 3.
The average number of eQTL per genomic region for the different SPE pattern in B73xIBM-RILs (A) and Mo17xIBM-RILs (B) is shown. The width of the connecting bands corresponds to the average number of eQTL. The shade indicates the genomic region of the eQTL, (dark = heterozygous, light = homozygous), and the blue color corresponds to cis and orange to trans-regulation. The genotypes at the eQTL position and regulated SPE genes are indicated as bars (yellow = Mo17 allele, turquoise = B73 allele), and the gene activity for SPE is shown as (on/off)
Trans-regulated genes and SPE genes are often non-syntenic
The maize genome contains genes with orthologs of positional synteny in sorghum, suggesting these genes are highly conserved across evolution, and a set of genes without syntenic partners (non-syntenic) in sorghum and other grass species, which are therefore most likely evolutionarily younger or changed their position during evolution [26]. In the B73v5 maize genome, 40% (15 612) of genes can be classified as non-syntenic. Among the active genes in this study, 27% (7622) were non-syntenic. By contrast, on average 58% of SPE genes in the hybrids are non-syntenic.
Interestingly, genes with trans-eQTL were more likely to be non-syntenic (70%) than genes with cis-eQTL (22%) across all expressed genes (Additional file 4: Table S6). Both cis- and trans-regulated SPE genes were more frequently non-syntenic than non-SPE genes (Fig. 4). Additionally, non-syntenic genes were particularly prominent among the trans-regulated SPE genes with Mo17 activity. For example, 96% of trans-regulated pattern 2 genes (B73/Mo17) were non-syntenic (Fig. 4). It should be noted that these are relative values: the absolute numbers of genes in patterns 3 and 8 are low in general, and patterns 1 and 6 are not often trans-regulated (Fig. 3). Since the genes with trans-eQTL are mostly non-syntenic, it is consistent that they are less conserved in their activity between different maize lines and thus often show SPE patterns.
Fig. 4.
Synteny of SPE pattern genes. The average proportion of non-syntenic genes (vs. syntenic genes) among cis- (blue) and trans (yellow)-regulated SPE genes. Percentages are an average of genes in the respective category across all A B73xIBM-RILs (N = 85) or B Mo17xIBM-RILs (N = 82)
SPE gene expression influences lateral root density in different ways
We performed a transcriptome wide association analysis (TWAS) to identify genes whose expression values associate with phenotypic traits. We analyzed four root traits (lateral root density, number of root tips, total root length, total root volume) and the two IBM-RIL backcross hybrid populations B73 × IBM-RIL and Mo17 × IBM-RIL separately, as different mechanisms of expression regulation and expression levels might be observed between the populations. We found 18 positively and 17 negatively correlated genes across the two populations. Only one gene was identified in both populations (Additional file 2: Table S7). Therefore, different genes might control hybrid vigor of roots in the different populations.
Among the 35 TWAS genes whose expression correlated with phenotypic traits, 7 showed SPE complementation in more than 10 hybrids. We designated these TWAS and SPE (TSG) genes (Additional file 2: Table S8). All of the TSG were identified in the B73 × IBM-RIL hybrid population. For lateral root density, 10 TWAS genes were identified by the BLINK and MLMM models of GAPIT in total, and 4 of these genes also showed an SPE pattern (TSG; Additional file 2: Table S8, Fig. 5A). We followed up on the two TSG which showed an SPE pattern in most hybrids (TSG 1 & TSG 2; Fig. 5B, C). TSG 1 (Zm00001eb349930) showed an SPE pattern in 30 of the 85 B73 × IBM-RIL hybrids (Fig. 5B, Additional file 2: Table S8), predominantly the SPE B73/Mo17. The gene was cis-regulated and non-syntenic. B73 × IBM-RIL hybrids with an active TSG 1 showing an SPE pattern had significantly lower lateral root density (α ≤ 0.05, Fig. 5B).
Fig. 5.
Transcriptome wide association analysis (TWAS). A Manhattan plot of TWAS analysis for lateral root density in B73xIBM-RIL hybrids. Each point shows a gene with the p-value of the association between the gene’s expression with lateral root density. Results for the analysis with the BLINK model are shown as circles; results of the MLMM model are shown as triangles. Filled points indicate significant TWAS genes that also show SPE pattern (TSG), with the number of hybrids where the gene shows an SPE in brackets. The horizontal line indicates the FDR = 0.05 by Bonferroni correction. B, C p-values correspond to Student’s t-test independent from the TWAS analysis based on gene activity. Points show lateral root density of each B73xIBM-RIL with either inactive (off) or active (on) TSG 1 Zm0001eb349930 (B) or TSG 2 Zm0001eb339600 (C) in the hybrids. Shape corresponds to the SPE pattern and color to the genotype at the eQTL position (yellow = cis regulating, blue = trans regulating, dark = heterozygous regions, light = homozygous regions). The color shows the genotype of the hybrid at the eQTL position. D A model of gene expression of the non-syntenic gene TSG 2 and the syntenic paralog of TSG 2, TSG 2 like, which are regulated from the same eQTL. Depiction shows activity in Mo17 and B73 inbred Lines. In heterozygous genotypes, TSG 2 is active, when at least one Mo17 allele is present at the eQTL position. TSG 2 like is active in all genotypes
A different effect was visible for TSG 2 (Zm00001eb339600), which showed B73/IBM-RIL pattern in 33 B73 × IBM-RIL hybrids (Fig. 5C, Additional file 2: Table S8). As there were no SNPs near the gene to classify the paternal allele of the gene itself, the SPE pattern was called B73/IBM-RIL, instead of B73/Mo17 or B73/B73. Interestingly, TSG 2 was trans-regulated and the genotype at the eQTL position could be determined. In the B73 × IBM-RILs, the hybrids with an active TSG 2 had a significantly (α ≤ 0.05) higher lateral root density and showed the heterozygous B73/Mo17 genotype at the eQTL position (Fig. 5C). When searching for sequences similar to TSG 2 (Zm00001eb339600) in the B73 reference genome, we identified a paralog (Zm00001eb039610) and called it TSG 2-like. TSG 2-like is located close to the eQTL position of TSG 2 (< 2.5 Mbp; Fig. 5D). While the TWAS gene TSG 2 is non-syntenic and trans-regulated, its paralog TSG 2-like at the eQTL position is cis-regulated from the same eQTL position and syntenic (Fig. 5D). The syntenic TSG 2-like gene is expressed in Mo17 and in B73 and expressed in all genotypes. However, the non-syntenic TSG 2 is only active in Mo17, but not in B73 (Fig. 5D) and subsequently only active in those hybrids where at least one Mo17 allele is present at the eQTL position (Fig. 5C). The investigation of both SPE candidate genes for lateral root density will be subject to further research. TSG 2 regulation already indicates a connection between syntenic cis-regulated and non-sytenic trans-regulated paralogous genes in the inbred line Mo17.
Discussion
SPE complementation, where the hybrid expresses a gene that is only active in one of its parents, was suggested to contribute to the translation of parental diversity into phenotypic heterosis [14]. This concept extends the classical dominance model of heterosis which explains heterosis by the complementation of many slightly deleterious alleles of the parents in the hybrid by dominant alleles on the genomic level [10] to the level of gene expression. This adds a tissue-specific component to this concept that can explain the tissue-specific differences in heterosis within a plant [7]. The number of SPE genes identified in this study is similar to previous studies for B73xMo17 and Mo17xB73 seedling roots [13, 15, 27]. In the IBM-RIL backcross hybrids, we identified on average a substantially lower phenotypic mid-parent heterosis, fewer SPE genes compared to the fully heterozygous hybrids and a lower average heterozygosity. We also observed that SPE genes are mainly located in heterozygous regions of the genome, thus explaining the lower numbers of SPE in the backcross hybrids (Fig. 2). Hence, the association of the number of SPE genes with heterosis is not only conditioned by genetic diversity [14] but also by the degree of heterozygosity. As a result of SPE complementation, the number of active genes increased with heterozygosity (Additional file 1: Fig. S5).
We demonstrated that the number of genes showing SPE complementation in homozygous and heterozygous regions of the genome (Fig. 2A–D) explains up to 29% of the heterotic variance in the phenotype of the hybrids (Table 1). As different genes are likely responsible for heterosis in different traits across plant development [28], our finding suggests that SPE genes are involved in heterosis manifestation of heterosis in young seedling roots, but to a different extent for different traits and different genetic backgrounds. It is for instance likely that SPE genes influence heterosis in B73xIBM-RILs but to a smaller degree in Mo17xIBM-RILs, which in general generate less vigorous hybrids. This interpretation is further supported by the TWAS analysis, where gene expression rate was associated with phenotypic values. Here we identified single SPE genes with a significant influence on the hybrid phenotype in the B73xIBM-RIL population but not in the Mo17xIBM-RIL population.
In our study, we detected different ratios of cis and trans regulation among active genes in inbred lines (Mo17:13% trans-regulation; B73; 7% trans-regulation; Table 2). In the reciprocal hybrids, this difference was significantly amplified for SPE genes (Mo17 active:~ 60% trans; B73 active:~ 5% trans; Additional file 1: Fig. S6), similar to the IBM-RIL backcross populations (Additional file 1: Fig. S7).
Previously, contrasting cis- and trans-regulation was associated with parental alleles in another eQTL study, using IBM-RIL backcrosses from a smaller subset of the IBM-RIL population [17]. In that study, 86% of trans-regulation was associated with paternal dominance, where the expression level of the paternal allele was adopted [17]. In our study, investigation of the eQTL positions revealed that most SPE genes are regulated by eQTL located in heterozygous regions. Substantial proportions of SPE genes located in homozygous regions are regulated in trans- from eQTL in heterozygous regions, leading to higher numbers of these specific SPE patterns (Fig. 3, patterns 4 and 7). We showed that for many trans-regulated SPE genes, the presence of the Mo17 allele in the eQTL is required for gene activity in the SPE pattern genes in the hybrid (Fig. 3, Additional file 1: Fig. S8). Thus, we did not observe paternal dominance of trans-regulated genes regarding SPE genes, but rather a Mo17 dominance of trans-regulated SPE genes. Nevertheless, we cannot exclude a role of maternal effects or imprinting in both populations.
Our present data confirms findings that non-syntenic genes are enriched among SPE genes [13] and are correlated with heterosis [29] (Fig. 4). We further expanded this concept by demonstrating that non-syntenic genes are enriched among trans-regulated genes. Non-syntenic genes have been associated with disease resistance genes [30] and were suggested to function in environmental adaptation of plants and help hybrids to cope with abiotic stress [27]. We identified a trans-regulated non-syntenic SPE candidate gene TSG 2 (Zm00001eb339600), which controls lateral root density in hybrids, whose expression was induced by the Mo17 allele at the eQTL position (Fig. 5B). This gene has a syntenic paralog Zm00001eb039610 (TSG 2-like), which is located close to the eQTL position (< 2.5Mbp; Fig. 5C). Interestingly, this paralog is cis-regulated from the same eQTL position as TSG 2 (Fig. 5C) and expressed in B73, Mo17 as well as heterozygous genotypes. We hypothesize that there might be a regulatory connection of the trans-regulated non-syntenic gene to the syntenic paralog, in the Mo17 genotype. Regulatory interactions of paralogous genes have been previously reported. For instance, the paralogous genes rcts and rctl are regulated by the same transcription factor [31] and the syntenic gene rtcs recruited younger non-syntenic genes during seminal root evolution [32]. A regulatory connection of trans-regulated non-syntenic genes with their syntenic cis-regulated paralogs in the Mo17 genotype could explain the high number of trans-regulating eQTL among non-syntenic SPE patterns associated with the Mo17 allele. The regulatory differences (B73: cis, Mo17: trans) between the parents of a hybrid and their different contributions to SPE pattern might be an aspect of how phylogenetic distance is contributing to heterosis, as shown in this study for early seedling root heterosis.
Among the TWAS genes whose expression correlated with phenotypic traits, we identified genes in the B73xIBM-RIL population for lateral root density, which displayed SPE in a substantial number of parent-hybrid combinations (Fig. 5, Additional file 2: Table S8). We surveyed the two candidate genes which displayed SPE in the highest number of parent-hybrid combinations in more detail. They showed significantly different lateral root density, based on the activity or inactivity of the gene in the hybrid, and thus the presence or absence of SPE. For the cis-regulated gene TSG 1 (Zm00001eb349930) (Fig. 5B), a lower lateral root density was observed upon gene activity. In contrast, gene TSG 2 (Zm00001eb339600) (Fig. 5C) is trans-regulated and displayed a higher lateral root density, upon gene activity, which is induced by the Mo17 allele at the eQTL position.
Thus, we observe regulatory effects of single genes displaying SPE on lateral root density which might help maize to adapt to changing local environmental conditions such as water availability where disparate lateral root densities are beneficial [33]. In summary, we also demonstrated that the association of the number of SPE genes with heterosis is not only conditioned by genetic diversity but also by the degree of heterozygosity. Additionally, SPE mediated phenotypic heterosis, as well as the regulation of SPE genes in IBM-RIL backcross populations depends on the genetic background of the population.
Methods
Plant genetic resources
The IBM-RIL syn. 4 population [19] represents a collection of highly diversified genotypes with respect to the genomic regions contributed by their two parental inbred lines B73 and Mo17 (Fig. 1A). To study the phenotypic and transcriptomic plasticity of maize F1 hybrids relative to their parents, a random subset of 112 IBM-RILs was backcrossed to their original parental inbred lines B73 and Mo17. In all crosses, B73 and Mo17 were selected as the female parent to secure a homogenous phenotype of all plants on which the pollinated ears will develop and thus similar seed quality. Each of the two F1-backcross hybrids of a specific IBM-RIL (hereafter named B73 × IBM-RIL and Mo17 × IBM-RIL backcross hybrids) show contrasting homozygous and heterozygous genomic regions and genes (Fig. 1A). For example, if a region between two recombination breakpoints is homozygous in a B73 × IBM-RIL backcross, it is heterozygous in the corresponding Mo17 × IBM-RIL backcross hybrid and vice versa.
Experimental design
We studied the phenotypic and transcriptomic plasticity of the IBM-RIL backcross hybrids relative to their parental inbred Lines. A selection of 112 IBM-RILs was used as paternal inbred Lines, corresponding to 112 B73 × IBM-RIL and 112 Mo17 × IBM-RIL backcross hybrids (Fig. 1A). To optimally fit the experimental design and increase the precision of subsequent pairwise comparisons with the common parental inbred lines and reference hybrids, both common maternal inbred lines B73 and Mo17 and the two reference hybrids B73 × Mo17 or Mo17 × B73, respectively, were included (Fig. 1B). For each sample, 25 kernels of the same genotype (parent or hybrid) were surface sterilized in 10% H2O2 for 20 min, rinsed with distilled water and afterwards pre-germinated in filter paper rolls with five kernels each in a climate chamber with a 16 h light (26 °C), 8 h dark (21 °C) cycle in distilled water [13]. After 3 days, eight seedlings per genotype with approximately the same length of primary root and, if already present, shoot length were selected and transferred into a row of an aeroponic growth system for 4 additional days. Each aeroponic growth system (“Elite Klone Machine 96,” TurboKlone, USA) was composed of 12 rows each with eight planting sites. Thus, we could fit 12 different genotypes into one aeroponic growth system and eight systems at the same time into our climate chamber (16 h Light, 26 °C; 8 h dark, 21 °C; Additional file 1: Fig. S9). We analyzed three independent biological replicates, each comprising all IBM-RIL inbred lines and hybrids. Due to the large number of samples and space limitations, the different genotypes of each biological replicate were grown in four batches distributed across four weeks, also called alpha-design with incomplete blocks [34]. Within each batch, the eight aeroponic growth systems were randomly assigned to eight positions in the climate chamber. Three successive rows of an aeroponic growth system represent one triplet. To each triplet, an IBM-RIL and its corresponding B73 and Mo17 backcross hybrids, or both common maternal inbred lines B73 and Mo17 and one of the two reference hybrids B73 × Mo17 or Mo17 × B73, respectively were assigned. The randomization process was conducted at the replicate level for the triplets, whereas it was ensured that in each batch two reference triplets each of B73, Mo17, B73xMo17 and B73, Mo17, Mo17xB73 were distributed. Thus, in each batch, we surveyed 30 IBM-RIL triplets and both reference hybrids and the common inbred lines B73 and Mo17 in two additional triplets (Additional file 1: Fig. S9). So that in total, 3 samples of each IBM-RIL and each backcross hybrid, 48 samples (biological replicates) of the maternal inbred lines B73 and Mo17, and 24 samples of the reciprocal hybrids B73 × Mo17 and Mo17 × B73 as reference hybrids were analyses. In other words, each independent biological replicate contained 384 samples (in total: 384 × 3 replicates = 1152 samples). The number of individual samples per replicate was designed to also fit one sequencing run on the NovaSeq 6000 S4 flow cell machine (Illumina, San Diego, USA), described later.
Root phenotyping and sampling for RNA-seq
Seven days after germination, all seedlings per sample (maximum eight seedlings) were removed from the aeroponic growth system, and the seedling root system was scanned using an Epson Expression 12000XL scanner (Epson, Meerbusch, Germany) with up to four plants per image. The resulting images were cropped to create single plant images that only showed the root system and the maize kernel. We used the RootPainter software client (version 0.1.0) and server component (version 0.2.7) to train a convolutional neural network to recognize and segment roots in images [35]. We then analyzed the segmented images in a batch using RhizoVision Explorer (version 2.0.3) [36]. After inspection and cleanup, we determined the total root length, the total root volume, and the number of root tips for each plant for subsequent analysis (details in additional file 3: Supplement Material SM1).
After imaging the seedlings, the primary root was separated from the kernel to collect (i) the proximal first centimeter with emerged lateral roots in 80% ethanol to count the number of lateral roots per cm as density and (ii) the distal region of the primary root, composed of the root tip and the meristematic zone followed by the elongation zone, in liquid nitrogen for subsequent RNA extraction.
Analysis of phenotypic data
To evaluate the phenotypic data of each genotype, a linear mixed model (baseline model 1) with a fixed effect for block (three replicates as levels) and genotype (263 levels) was fitted. According to the layout of our experimental design (Additional file 1: Fig. S9), we included random effects for row, triplet, system and batch effect in the model. The residual error assesses the within-row variance among plants.
1 |
represents the mean phenotypic value of a specific trait of interest of the respective genotype I; µ represents the intercept; represents the fixed effect for genotype i; represents the fixed effect for block j; represents the random effect for batch k nested within block j; represents the random effect for system l nested within batch k and block j; represents the random effect for triplet m nested within system l, batch k and block j; represents the random effect for row n nested within triplet m, system l, batch k and block j; and represents the random error effect for plant p of genotype i in block j, batch k, system l, triplet m and row n. To fulfil the assumptions of linear models, the phenotypic values for the traits “total root length,” “total root volume,” and “total number of root tips” had to be square root-transformed. An offset of 0.5 was added to each phenotypic value before transformation. The resulting modelled means were transformed back to their original scale for visual inspection (Additional file 1: Fig. S2). Modelled means on the transformed scale (and original scale in case of lateral root density) were used for TWAS analysis (described below).
RNA-sequencing and preparation of alignments
For subsequent RNA extraction and sequencing, a maximum of eight primary roots of each genotype grown in the same row of an aeroponic system were pooled. These root samples were manually ground in liquid nitrogen and total RNA was isolated with the RNeasy Plant Mini Kit (QIAGEN, Venlo, the Netherlands). RNA quality was assessed with a Bioanalyzer (RNA ScreenTape + TapeStation Analysis Software 3.2, Agilent Technologies, Santa Clara, CA, USA) by the Next Generation Sequencing (NGS) Core Facility in Bonn, Germany (https://btc.uni-bonn.de/ngs/), which subsequently constructed cDNA libraries for RNA-seq according to the TruSeq stranded mRNA library preparation protocol (Illumina, San Diego, USA). Sequencing was performed on a NovaSeq 6000 S4 flow cell machine (Illumina, San Diego, USA), generating 100-bp paired-end reads. This allowed for processing all 384 samples of a single replicate in one flow-cell and each batch of 96 samples on one lane. The obtained reads are reversely stranded. The raw reads were trimmed and filtered using Trimmomatic (version 0.39) in paired-end mode with the following settings: ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:2:30:10:8:True, LEADING:3, TRAILING:3, MAXINFO:30:0.8, and MINLEN:40. With this step, remaining adapter sequences were removed, low quality bases from the start and end of the reads were cropped, and adaptive quality trimming was performed. After these quality control, reads with a minimum length of 40 bases were retained and resulting single-end reads were excluded [37]. The maize reference genome B73 version 5 (B73v5, ftp.ensemblgenomes.org/pub/plants/release-52/fasta/zea_mays/dna/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.dna.toplevel.fa.gz) was indexed with exon information from the corresponding annotation file (http://ftp.ensemblgenomes.org/pub/plants/release-52/gff3/zea_mays/Zea_mays.Zm-B73-REFERENCE-NAM-5.0.52.gff3.gz). The trimmed reads were aligned to the indexed reference genome using Hisat2 (version 2.2.1) [38] with the appropriate input file settings and intron lengths: -q–phred 33–rna-strandedness RF–min-intronlen 20–max-intronlen 60,000. The data was then saved in BAM format using the samtools view command from htslib (version 1.14) [39]. Picard tools (version 2.27.1; http://broadinstitute.github.io/picard/) were used to remove duplicates using MarkDuplicates.
Reads aligned to exons of genes were counted using htseq-count (version 2.0.1), with specifications to only count uniquely mapped reads [40]. Samples with less than 5 million counted reads were excluded.
Preparation of alignments for SNP calling
For SNP calling between the genotypes of this study and the B73 reference genome, the read alignments were processed using the HaplotypeCaller of GATK (version 4.2.6.1) with respect to GATK’s best practices for RNA-seq data (https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-, checked on 12/14/2023). First, Picard’s AddOrReplaceReadGroups (version 2.27.1) was used to add readgroup information to the alignments of all samples. The replicate number was set as RGLB, the RGPL field was set to “ILLUMINA,” the RGPU field was set to “unknown,” and the RGSM field was filled with the sample name. The samtools view (version 1.14) command was used to filter for uniquely mapped reads by only including reads with mapping quality of 60 or higher and to format and index the alignments. Second, GATK’s SplitNCigarReads was used to split alignments at positions with N in the CIGAR field, such as intron-spanning alignments [41].
SNP calling between B73 and Mo17 samples and B73v5 reference for sample evaluation
In brief, the GATK HaplotypeCaller was used to identify variants between the Mo17 samples and the B73v5 reference genome. The frequency of the B73 and Mo17 alleles at each SNP locus was previously identified in a similar manner [42, 43]. The ratio of homozygous loci was calculated and samples with less than 95% homozygosity across expectedly homozygous loci were excluded (details in additional file 3: Supplementary Material SM2). A total of 175 RNA samples were excluded because they did not meet the criteria of being homozygous in ≥ 95% of the supposedly homozygous loci. The percentage of homozygous loci across the whole genome is indicated in additional file 2: Table S1. Across the investigated samples from inbred Lines, the median for the percentage of homozygous loci was 99.4%. This indicates a very low rate of residual heterozygosity (Additional file 2: Table S1). A small number of samples from the highly homozygous inbred lines B73 and Mo17 also showed homozygosity rates around 95%. Therefore, we attributed the heterozygosity of less than 5% to arbitrary technical reasons, but not residual heterozygosity. Additionally, 10 samples were excluded beforehand due to their library size being < 5 million read counts. Moreover, 17 samples were excluded because only one of three replicates was left for the respective genotype. Since downstream analyses include the comparisons between both parents and their resulting hybrids, we had to further exclude 90 hybrid RNA-seq samples because all corresponding paternal IBM-RIL RNA-seq samples were excluded. In addition, 8 IBM-RIL RNA-seq samples were excluded because the corresponding hybrid samples were missing. Finally, 852 RNA-samples remained for subsequent SNP calling as described below (Additional file 2: Table S1).
SNP calling between all high-confidence samples and the B73v5 reference
In the second SNP calling, SNPs between each sample and the B75v5 reference were called. We included the previously identified variants between our Mo17 samples and the B73v5 reference, as well as variants from our B73 samples vs. the B73v5 reference. They were filtered based on several criteria. The mapping quality (MQ), variant site quality (QUAL), Fisher strand (FS), and allele depth (AD) of SNP alleles were used with different thresholds for InDels and SNPs with respect to GATKs’ guide on hard-filtering short variants (https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants, checked on 12/18/2023). For the SNPs, the filters QD > 2, SOR < 3, MQ > 40, QUAL > 30, FS < 60, and FORMAT/AD [0:1] > 5 were applied. For the indels, the filters QD > 5, QUAL > 30, FS < 200 and FORMAT/AD[0:1] > 5 were applied. The base qualities of each sample were then recalibrated using the filtered SNPs and indels as known-sites with GATK (version 4.2.6.1). BaseRecalibrator was run to generate recalibration tables, which were then applied to the aligned reads with ApplyBQSR (https://gatk.broadinstitute.org/hc/en-us/articles/360035531192-RNAseq-short-variant-discovery-SNPs-Indels-, checked on 12/14/2023). The HaplotypeCaller was run with the recalibrated samples in BP_RESOLUTION mode and reported the SNP sites at each individual position. The resulting variant files were then filtered for positions with a coverage (DP) of ≥ 1, to eliminate loci without any information. The variant files from Mo17 and B73 samples were combined by GenomicsDBImport. The samples of a triplet (IBM-RIL samples plus corresponding B73 × IBM-RIL and Mo17 × IBM-RIL hybrids) were combined with the Mo17 and B73 samples, resulting in one database per triplet. Genotyping of all samples within each database was performed using the GenotypeGVCFs function. Since the Mo17 and B73 samples are present in each database, we ensured that genotyping was performed on the loci differentiating B73 and Mo17 in each database [41]. The genotyping data was then filtered for SNPs with QD > 2, SOR < 3, MQ > 40, QUAL > 30, and FS < 60 using bcftools (version 1.17). A list of high confidence SNPs was created in R using the results from the HaplotypeCaller of the Mo17 and B73 samples (Additional file 2: Table S9). For these loci, it was established that the genotyping results of the HaplotypeCaller (B73v5 reference allele vs. non-reference allele) correspond to the B73 and Mo17 alleles of the germplasm of this study (reference = B73, non-reference = Mo17): The HaplotypeCaller reports for each SNP locus of each sample the most likely genotype, which we term genotype-call in the following and the corresponding genotype quality (GQ), a measure for the confidence of the genotype-calls. Only genotype-calls of bi-allelic loci with a GQ of ≥ 10 were considered. The loci were filtered to include only those where ≥ 90% but a minimum of three remaining genotype-calls in Mo17 samples are homozygous for the non-reference allele, and 90% but a minimum of three remaining genotype-calls in B73 samples are homozygous for the reference allele. Next to the high confidence B73 vs Mo17 SNPs, we identified SNPs which did not belong to B73 or Mo17 as IBM-RIL specific (homozygous or heterozygous and regardless of GQ). Loci genotyped with a GQ of < 10 were filtered. Only loci that were either in the high confidence SNP list of B73 vs. Mo17 alleles or which were IBM-RIL-specific (for masking putative IBM-RIL specific regions) are considered further.
Classification of IBM-RIL genomic regions
The filtered SNP data were used to classify each IBM-RIL genome into B73 or Mo17 regions and to mask regions which were not B73 or Mo17. A distance-function was used to calculate the distance between the IBM-RIL specific loci. Loci with a distance of < 2.5 Mbp were grouped together as a block. Blocks containing a minimum of 10 IBM-RIL-specific loci, with ≥ 5 of those being homozygous, were identified and masked as IBM-RIL-specific third origin regions. The start and end positions of these regions were recorded and loci within those regions were dropped. Next, a sliding window approach was used to eliminate singular loci that did not match their surrounding loci. A window of 15 loci was used, and ≥ 11 had to be homozygous for the Mo17 allele for the window to be considered a Mo17 window. For a B73 window, ≥ 12 out of 15 loci must have homozygous B73 alleles. The values for the windows were obtained by computing the minimum number of matching loci in a 15-loci window across B73 and Mo17 samples. Otherwise, the window was considered ambiguous [44]. Loci within an ambiguous window were dropped, as well as loci which were classified differently from their window. The previously mentioned distance-function was utilized to calculate the distance between the remaining loci. Loci that carried the same allele and which were less than 0.5 Mbp apart were grouped together as a block, and all blocks were retained. The start and end positions of these blocks were recorded as the Mo17 and B73 regions within each IBM-RIL (Additional file 2: Table S10). Two IBM-RILs which had more than 50% of their genomes consisting of IBM-RIL specific regions from a third parental origin were excluded along with their hybrids (Additional file 2: Table S1), leaving 834 samples for final analyses. The data set of each triplet reported by the HaplotypeCaller was filtered to only include loci within the B73 or Mo17 regions of the IBM-RILs and within exons of protein-coding genes. We checked for all protein-coding genes whether they were located in a B73 or Mo17 region or masked as neither a B73 or Mo17 region, or whether they were located in a genomic region without SNP information. This verification was performed for each IBM-RIL separately. Centromere locations of the 10 chromosomes were taken from the genome assembly of MaizeGDB by selecting the “Knobs, centromeres and telomeres” information https://jbrowse.maizegdb.org [45]. The proportion of heterozygous to homozygous regions was calculated for each backcross hybrid by dividing the total lengths of classified heterozygous regions (B73 regions of the IBM-RIL for the Mo17 × IBM-RIL and Mo17 regions of the IBM-RIL for B73 × IBM-RIL) by the total lengths of all classified regions (not considering IBM-RIL specific masked regions and regions without SNP information).
Multidimensional scaling (MDS)
To evaluate the quality and the structure of the RNA-seq samples in this study, a multidimensional scaling (MDS) plot was used. The active genes of the 834 filtered samples were compared by the plotMDS() Bioconductor package limma (version 3.50.3) [46] in R.
Analysis of expression complementation
The activity/expression status of each gene was determined as previously described [14] based on thresholding normalized read counts. In short, fitting a generalized additive model (R package mgcv) [47] using guanine-cytosine (GC) content and log-transformed gene length as explanatory variables to account for artifactual read count differences across genes [48] resulted in a predictive count for each gene. The inverse of the predictive count was used as a multiplicative gene-specific normalization factor. In addition, sample-wise scale factors using the trimmed mean of M-values (TMM) method were estimated to adjust for differences between library sizes [49]. Each raw read count was multiplied by the product of the corresponding gene-specific normalization factor and the TMM scale factor to obtain a normalized count. The average expression level of each gene was represented by the mean normalized count across all replicates of each genotype in our data set. After estimation of the density distribution, the 0.25 quantile of the non-zero average expression levels was set as the threshold for calling the activity status of each gene in each sample. Thus, a gene was called active if the average expression level across all replicates was higher than the threshold and otherwise inactive for each genotype. Genes active in only one parent but also the hybrid are designated single parent expression (SPE) genes, as the expression of only one parent is complemented in the hybrid. We identified these, by comparing the activity of each gene in the hybrid with their corresponding parents. From the classification of the IBM-RIL regions, we deducted the genotypes of the SPE genes in the hybrids. Based on these genotypes, we classified our SPE genes into those within heterozygous (B73/Mo17 in B73 × IBM-RILs, Mo17/B73 in Mo17 × IBM-RILs) or homozygous regions (B73/B73 in B73 × IBM-RILs, Mo17/Mo17 in Mo17 × IBM-RILs). We further distinguished SPE by the active parent and indicated the active parent in bold. For example, a SPE gene in a heterozygous region of a B73 × IBM-RIL, where the IBM-RIL parent is active, but the B73 is not, the pattern is be B73/Mo17 (Fig. 2A). For a SPE gene in a homozygous region of a Mo17 × IBM-RIL, where the Mo17 parent is active, but the IBM-RIL is not, the pattern is Mo17/Mo17 (Fig. 2B).
Proportion of heterotic variance explained by the number of SPE genes underlying mid-parent heterosis (MPH) of root phenotypes
To estimate the fraction of the heterotic variance explained by the number of genes displaying SPE patterns, we propose the parameter , where defines the total genetic variance across the hybrid genotypes and is the genetic variance of the heterosis effect not associated with the number of SPE genes [50, 51].
For each backcross population, the genetic variance of the mid-parent heterosis (MPH) effects was estimated separately in a “full” regression model (2) based on an extension of the baseline model (1). For this purpose, we defined for each parental genotype (i.e., each the IBM-RIL and the two common parental inbred lines B73 and Mo17) covariates (). These covariates were initially all set to 0 for each observation. For observations on the parental genotypes, the corresponding covariate for that specific parent was set to 1. For the observation on the hybrids, the two covariates corresponding to its two parents were set to 0.5. Thus, collectively these covariates model the effect of the per se performance of the parents and the mid-parent values of the hybrids.
MPH was modelled by a regression on the number of SPE genes. For this purpose, the number of SPE genes was set to 0 for all parental genotypes. This was done to be able to include them in the overall model. However, the parental genotypes have no impact on the regression, because their effect is fully absorbed by the covariates for the parental genotype effects. The covariate zi (defined below) was included as a fixed effect . This was done because the genetic variance of heterosis is unlikely to be distributed around a mean of zero. The random heterosis effects will be distributed around the non-zero mean .
As the MPH effects of the hybrids were not expected to fall on the regression line, we allowed for deviations from the regression by adding a random effect for hybrids. This was implemented by fitting the random effect z*genotype, where z is a continuous dummy covariable with z = 0 for the parental genotypes and z = 1 for the hybrids. This dummy variable acts as a switch that turns the random effect off for parental genotypes and on for hybrids [52]. The problem of rank deficiency in the design matrix for fixed effects was solved by removing the intercept and including fixed effects bj (j = 2, 3) for replicates 2 and 3, effectively setting b1 = 0.
2 |
represents the parental effect value contributing to MPH of the corresponding hybrid genotype i of a specific trait of interest. represents the parental covariables of parent q for genotype i; represents the fixed effect for block j, , , and represent the covariables for the number of genes displaying pattern 1 () -4 () (Fig. 2A) or 5 () -8 () (Fig. 2B) in a hybrid, respectively. represents the random effect for a hybrid (corresponding to genotype i), zi is a dummy variable with zi = 0 for parents and zi = 1 for hybrids [52]. Variable represents the random effect for batch k nested within block j; represents the random effect for system l nested within batch k and block j; represents the random effect for triplet m nested within system l, batch k and block j; represents the random effect for row n nested within triplet m, system l, batch k and block j; and represents the random error effect for plant p of genotype i in block j, batch k, system l, triplet m, and row n.
The analysis was implemented in R (version 4.0.1) using the lme4 package (version 1.1–29). In contrast to the baseline model, the fixed effect for genotype was replaced by individually defined covariates of the parental genotypes and fixed effects for the number of SPE genes.
To determine , a “null” model (3) excluding the fixed effects of the covariates accounting for the number of SPE genes was fitted.
3 |
where the notations are the same as in model (2).
To perform the same analysis on the basis of better-parent heterosis (BPH) instead of MPH, the covariates () were adjusted, while Eqs. 2 and 3 were kept consistent. For observations on the parental genotypes, the corresponding covariate for that specific parent was kept as 1. For the observation on the hybrids, the two covariates corresponding to its two parents were not set to 0.5, but instead the covariate corresponding to the better parent was set to 1 and the one corresponding to the other parent to 0. Thus, collectively, these covariates now model the effect of the per se performance of the parents and the better-parent values of the hybrids.
Expression quantitative trait loci (eQTL) analysis
An eQTL analysis was performed with the R/qtl2 package (version 0.22) [53] to identify positions that were significantly associated with gene expression values based on the masked and filtered SNP data. For each of the three cross-types (IBM-RIL, B73 × RILs, Mo17 × RILs), the classified and filtered SNP loci within B73 or Mo17 regions in the IBM-RILs were taken as marker input data. The positions of these SNP loci were used as preliminary genomic positions, as well as physical positions. The estimated expression means obtained from the model coefficients within the differential expression analysis of each genotype and gene were used as the phenotype data input in R/qtl2. As additional specifications to write the control file, the cross type was set to “rilself” for RILs by selfing, the alleles were set to “A” and “B” for B73 and Mo17 and the genotype codes were set to A/A = 1 and B/B = 2 to specify the transformation of homozygous alleles into numeric values (https://github.com/agroot-ibed/r-qtl2-analysis, updated on 12/19/2023, checked on 12/19/2023). Samples with more than 19% missing genotypes were dropped, as well as duplicated genotypes and markers with more than 60% missing genotype information. The genetic map was estimated from the physical positions and genotype information by the est_map() function with parameters maxit = 2000, error_prob = 0.001 and tol = 0.0001. The reduce_markers() function was used to retain only markers that were ≥ 1 centiMorgan (cM) apart to avoid retaining an excess of redundant markers. Pseudomarkers were inserted at a distance of 1 cM to the existing markers. A hidden Markov model calculated the genotype probabilities at all positions, with error_prob = 0.001. This was followed by a genome scan, which was done by a Haley-Knott regression [54] to establish the association between genotype and expression phenotype with a linear model. In simple words, within each eQTL analysis, each marker is tested to see whether there is an association with the expression of single genes; the result is an LOD curve. In order to find out whether the highest LOD value is significant, a permutation was carried out and all significant peaks were saved. In more detail, to calculate the adjusted p-values for the resulting logarithm of odds (LOD) scores for a single gene, 10,000 permutations were done, reshuffling the expression data randomly and recording the maximum LOD score of each permutation. Selecting a significance threshold of α ≤ 0.001, we used the 99.9th percentile of the ordered LOD maxima as the threshold to detect a significant eQTL for the gene [55]. The genomic map was converted to a physical map with the interp_map() function. By selecting the respective threshold, the physical position, confidence interval and LOD of significant peaks was obtained by the find_peaks() function [53]. The exact adjusted p-values were determined by calculating the percentile of permutation maxima, higher than the respective LOD [55]. This process was repeated for all (37,782) active genes. To subsequently also correct for the testing of multiple genes, the false discovery rate was used on the adjusted p-values of the LOD peaks of all genes with p.adjust() function setting method to “FDR” and n to the total number of genes plus the number of second and third significant peaks. Peaks with an FDR ≤ 0.001 were considered significant. Start and end position of genes were added from the annotation file. The same procedure was performed on all three cross-type data sets (IBM-RIL, B73 × IBM-RIL, Mo17 × IBM-RIL). The resulting eQTL peaks were combined and distinct eQTLs were selected: in cases where multiple eQTLs were identified for a gene, we assessed whether the different peak positions corresponded to different regulatory elements. If eQTLs for the same gene were ≥ 25 Mbp apart or on different chromosomes and their positions did not lie within the confidence intervals of each other, they were considered to be different from each other and were retained. If multiple eQTLs for the same gene did not differ by the specified standards but were in close proximity to each other, only the eQTL with the shortest confidence interval or the highest LOD in case of equal confidence intervals was retained in the merged list. The eQTLs were classified into cis and trans eQTLs based on their distance from the start of their respective gene. Trans-regulating eQTLs were defined as located at a distance of at least 2.5 Mbp from the start of the gene and where their confidence interval did not include the start of the gene. Cis-regulating eQTLs were defined as located in proximity to the start of the gene (< 2.5 Mbp) or located such that their confidence interval includes the start of the gene (Additional file 2: Table S5). The cutoff of 2.5 Mbp was chosen as an initial threshold under the consideration of the average distance between cross-overs (0.3Mbp across all IBM-RILs) and length of support intervals around eQTL positions (Median: 2.48 Mbp). As the majority (83%) of trans-regulating eQTL are located on another chromosome than the corresponding gene. Additionally, the majority of cis-regulating eQTL were located outside of the confidence interval (88%). This distribution of the distances of eQTL to their genes indicates that altering the threshold of 2.5 Mbp would not result in major changes in the classification of eQTL into cis and trans.
Transcriptome wide association analysis (TWAS)
A TWAS was conducted to associate gene expression levels of active genes with phenotypic traits. The active genes were filtered to select those genes which were active in at least 5% of all genotypes (14). We used the MLMM [56], BLINK [57], and FarmCPU [58] models, implemented in the R package GAPIT (version 3) [59], including the three first principal components for the initial identification of genes and in case of the MLMM the variance–covariance matrix between individuals as kinship. Each population (IBM-RILs, B73 × IBM-RILs, Mo17 × IBM-RILs) was analyzed separately. For use in GAPIT, expression values were rescaled to values between 0 and 2 for each population. The presented TWAS + SPE candidate genes (TSG) were additionally investigated using Student’s t-test.
Determination of syntenic and non-syntenic genes
The syntenic and non-syntenic genes were determined by comparison against a published list of syntenic grass genes [60]. This list is based on the B73v5 genome sequence annotation. Those genes with cis-regulatory eQTL were compared against those with trans-regulatory eQTL using the Fishers’ exact test in R with fisher.test().
Supplementary Information
Additional file 1: Supplement figures, Figures S1–S9.
Additional file 2: Table S1 Sample filtering: Details of retained and excluded samples. Table S2 MPH and BPH: Phenotypic mid-parent and better-parent heterosis values of each hybrid. Table S3 SPE pattern ref: SPE pattern of all genes in both reference hybrids B73xMo17 and Mo17xB73 and their eQTL. Table S5 eQTL details: Details on all identified eQTL. Table S7 TWAS genes: All genes and details identified in the TWAS analysis. Table S8 TSG: TWAS candidate genes with substantial numbers of SPE pattern across hybrids of (TSG) and corresponding details. Table S9 SNPs B73 vs. Mo17: List and details of identified loci (SNPs) with different alleles in B73 and Mo17 samples. Table S10 IBM-RIL regions: Specifications of start and end position of each identified Mo17, B73, or IBM-RIL specific region in each IBM-RIL.
Additional file 3: Supplement Material Supplement Material 1 and 2 with more detailed description of image analysis and initial SNP calling.
Additional file 4: Table S4 Proportion of heterotic variance explained by the number of SPE genes on better-parent heterosis for different root phenotypes. Table S6 Synteny of all active genes with eQTL.
Acknowledgements
We would like to thank KWS for the propagation of the IBM-RIL syn. 4 seeds. We thank Helmut Rehkopf (University of Bonn) for his support in propagating the genetic material for this study and the experimental station "Auf dem Hügel” of the University of Bonn. This work was supported by the Deutsche Forschungsgemeinschaft (DFG) Next Generation Sequencing Competence Network (NGS-CN; project 423957469) grant HO 2249/18-1 to F.H. NGS analyses were carried out at the West German Genome Center (WGGC) site in Bonn. The authors acknowledge access to the bonna cluster hosted by the University of Bonn along with the support provided by its High-Performance Computing & Analytics Lab.
Peer review information
Wenjing She was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.
Authors’ contributions
M.P. and J.A.B. carried out the experiments, conducted the statistical analyses, interpreted the data, and drafted the manuscript. A.S.M, G.L, H.S. gave advice regarding population genetics, TWAS analysis and bioinformatics handling of the data. H.-P.P. provided help with the experimental design for the RNA-seq experiment and helped with the statistical analyses. F.H. conceived and coordinated the study and participated in data interpretation and drafting the manuscript. All authors read and approved the final manuscript.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Data availability
Sequence data used in this study have been deposited in NCBI Bioproject with the ID PRJNA923128 [61] and was previously described [24].
Declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Marion Pitz and Jutta A. Baldauf contributed equally to this work.
References
- 1.Hochholdinger F, Yu P. Molecular concepts to explain heterosis in crops. Trends Plant Sci. 2025. 10.1016/j.tplants.2024.07.018. [DOI] [PubMed] [Google Scholar]
- 2.Shull GH. Duplicate genes for capsule-form in Bursa bursa-pastoris. ZVer-erbungslehre. 1914;12:97–149. 10.1007/bf01837282. [Google Scholar]
- 3.Hochholdinger F, Baldauf JA. Heterosis in plants. Curr Biol. 2018;28:R1089–92. 10.1016/j.cub.2018.06.041. [DOI] [PubMed] [Google Scholar]
- 4.Paril J, Reif J, Fournier-Level A, Pourkheirandish M. Heterosis in crop improvement. Plant J. 2024;117:23–32. 10.1111/tpj.16488. [DOI] [PubMed] [Google Scholar]
- 5.Hoecker N, Keller B, Piepho H-P, Hochholdinger F. Manifestation of heterosis during early maize (Zea mays L.) root development. Theor Appl Genet. 2006;112:421–9. 10.1007/s00122-005-0139-4. [DOI] [PubMed] [Google Scholar]
- 6.Paschold A, Marcon C, Hoecker N, Hochholdinger F. Molecular dissection of heterosis manifestation during early maize root development. Theor Appl Genet. 2010;120:383–8. 10.1007/s00122-009-1082-6. [DOI] [PubMed] [Google Scholar]
- 7.Flint-Garcia SA, Buckler ES, Tiffin P, Ersoz E, Springer NM. Heterosis is prevalent for multiple traits in diverse maize germplasm. PLoS ONE. 2009;4:e7433. 10.1371/journal.pone.0007433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Shull GH. The composition of a field of maize. J Hered. 1908;os-4:296–301. 10.1093/jhered/os-4.1.296. [Google Scholar]
- 9.East EM. Inbreeding in corn. Reports of the Connecticut agricultural experiment station for years. 1908;1907–1908:419–28. [Google Scholar]
- 10.Jones DF. Dominance of linked factors as a means of accounting for heterosis. Proc Natl Acad Sci U S A. 1917;3:310–2. 10.1073/pnas.3.4.310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Paschold A, et al. Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents. Genome Res. 2012;22:2445–54. 10.1101/gr.138461.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Paschold A, et al. Nonsyntenic genes drive highly dynamic complementation of gene expression in maize hybrids. Plant Cell. 2014;26:3939–48. 10.1105/tpc.114.130948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Baldauf JA, et al. Single-parent expression is a general mechanism driving extensive complementation of non-syntenic genes in maize hybrids. Curr Biol. 2018;28:431–4374. 10.1016/j.cub.2017.12.027. [DOI] [PubMed] [Google Scholar]
- 14.Baldauf JA, et al. Single-parent expression complementation contributes to phenotypic heterosis in maize hybrids. Plant Physiol. 2022;189:1625–38. 10.1093/plphys/kiac180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li Z, et al. Single-parent expression drives dynamic gene expression complementation in maize hybrids. Plant J. 2021;105:93–107. 10.1111/tpj.15042. [DOI] [PubMed] [Google Scholar]
- 16.Botet R, Keurentjes JJB. The role of transcriptional regulation in hybrid vigor. Front Plant Sci. 2020;11:410. 10.3389/fpls.2020.00410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Swanson-Wagner RA, et al. Paternal dominance of trans-eQTL influences gene expression patterns in maize hybrids. Science. 2009;326:1118–20. 10.1126/science.1178294. [DOI] [PubMed] [Google Scholar]
- 18.Liu SC, Kowalski SP, Lan TH, Feldmann KA, Paterson AH. Genome-wide high-resolution mapping by recurrent intermating using Arabidopsis thaliana as a model. Genetics. 1996;142:247–58. 10.1093/genetics/142.1.247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lee M, et al. Expanding the genetic map of maize with the intermated B73 × Mo17 (IBM) population. Plant Mol Biol. 2002;48:453–61. 10.1023/a:1014893521186. [DOI] [PubMed] [Google Scholar]
- 20.Rahman H, et al. Molecular mapping of quantitative trait loci for drought tolerance in maize plants. Genet Mol Res. 2011;10:889–901. 10.4238/vol10-2gmr1139. [DOI] [PubMed] [Google Scholar]
- 21.Pan Q, et al. The genetic basis of plant architecture in 10 maize recombinant inbred line populations. Plant Physiol. 2017;175:858–73. 10.1104/pp.17.00709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Huo X, Wang J, Zhang L. Combined QTL mapping on bi-parental immortalized heterozygous populations to detect the genetic architecture on heterosis. Front Plant Sci. 2023;14:1157778. 10.3389/fpls.2023.1157778. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Yang H, et al. Qtl mapping for plant height and ear height using bi-parental immortalized heterozygous populations in maize. Front Plant Sci. 2024;15:1371394. 10.3389/fpls.2024.1371394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Pitz M, Baldauf JA, Piepho H-P, Hochholdinger F. Nonadditive gene expression contributing to heterosis in partially heterozygous maize hybrids is predominantly regulated from heterozygous regions. New Phytol. 2025. 10.1111/nph.70128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Göring HH, Terwilliger JD, Blangero J. Large upward bias in estimation of locus-specific effects from genomewide scans. Am J Hum Genet. 2001;69:1357–69. 10.1086/324471. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Schnable, J. C. & Lyons, E. Comparative genomics with maize and other grasses: from genes to genomes! Maydica 56 (2011):183-199.
- 27.Marcon C, et al. Stability of single-parent gene expression complementation in maize hybrids upon water deficit stress. Plant Physiol. 2017;173:1247–57. 10.1104/pp.16.01045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Springer NM, Stupar RM. Allelic variation and heterosis in maize: how do two halves make more than a whole? Genome Res. 2007;17:264–75. 10.1101/gr.5347007. [DOI] [PubMed] [Google Scholar]
- 29.Wang B, et al. De novo genome assembly and analyses of 12 founder inbred lines provide insights into maize heterosis. Nat Genet. 2023;55:312–23. 10.1038/s41588-022-01283-w. [DOI] [PubMed] [Google Scholar]
- 30.Salvi S. An evo-devo perspective on root genetic variation in cereals. J Exp Bot. 2017;68:351–4. 10.1093/jxb/erw505. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Xu C, et al. Cooperative action of the paralogous maize lateral organ boundaries (LOB) domain proteins RTCS and RTCL in shoot-borne root formation. New Phytol. 2015;207:1123–33. 10.1111/nph.13420. [DOI] [PubMed] [Google Scholar]
- 32.Tai H, et al. Non-syntenic genes drive RTCS-dependent regulation of the embryo transcriptome during formation of seminal root primordia in maize (Zea mays L.). J Exp Bot. 2017;68:403–14. 10.1093/jxb/erw422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Yu P, et al. Seedling root system adaptation to water availability during maize domestication and global expansion. Nat Genet. 2024;56:1245–56. 10.1038/s41588-024-01761-3. [DOI] [PubMed] [Google Scholar]
- 34.Piepho H-P, Büchse A, Truberg B. On the use of multiple lattice designs and α-designs in plant breeding trials. Plant Breed. 2006;125:523–8. 10.1111/j.1439-0523.2006.01267.x. [Google Scholar]
- 35.Smith AG, et al. Rootpainter: deep learning segmentation of biological images with corrective annotation. New Phytol. 2022;236:774–91. 10.1111/nph.18387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Seethepalli A, et al. Rhizovision explorer: open-source software for root image analysis and measurement standardization. AoB Plants. 2021;13:plab056. 10.1093/aobpla/plab056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20. 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12:357–60. 10.1038/nmeth.3317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Danecek P, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Anders S, Pyl PT, Huber W. HTSeq–a python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9. 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.van der Auwera, G. & O'Connor, B. D. Genomics in the Cloud: using Docker, GATK, and WDL in Terra. 1st ed. (O'Reilly Media, Incorporated, Beijing, 2020).
- 42.Vedder L. 2024. MaizeSNP Available at. 10.5281/ZENODO.10684044.
- 43.Baldauf JA, Vedder L, Schoof H, Hochholdinger F. Robust non-syntenic gene expression patterns in diverse maize hybrids during root development. J Exp Bot. 2020;71(865):876. 10.1093/jxb/erz4521. [DOI] [PubMed] [Google Scholar]
- 44.Huang X, et al. High-throughput genotyping by whole-genome resequencing. Genome Res. 2009;19:1068–76. 10.1101/gr.089516.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hufford MB, et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science. 2021;373(6555):655–62. 10.1126/science.abg5289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ritchie ME, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47. 10.1093/nar/gkv007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wood, S. N. Generalized additive models. An introduction with R. 2nd ed. (CRC Press/Taylor & Francis Group, Boca Raton, 2017).
- 48.Lithio A, Nettleton D. Hierarchical modeling and differential expression analysis for RNA-seq experiments with inbred and hybrid genotypes. JABES. 2015;20:598–613. 10.1007/s13253-015-0232-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. 10.1186/gb-2010-11-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Feldmann MJ, Piepho H-P, Bridges WC, Knapp SJ. Average semivariance yields accurate estimates of the fraction of marker-associated genetic variance and heritability in complex trait analyses. PLoS Genet. 2021;17:e1009762. 10.1371/journal.pgen.1009762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Piepho H-P. A coefficient of determination (R2) for generalized linear mixed models. Biom J. 2019;61:860–72. 10.1002/bimj.201800270. [DOI] [PubMed] [Google Scholar]
- 52.Piepho H-P, Williams ER, Fleck M. A note on the analysis of designed experiments with complex treatment structure. HortSci. 2006;41:446–52. 10.21273/HORTSCI.41.2.446. [Google Scholar]
- 53.Broman KW, et al. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics. 2019;211:495–502. 10.1534/genetics.118.301595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Haley CS, Knott SA. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity. 1992;69:315–24. 10.1038/hdy.1992.131. [DOI] [PubMed] [Google Scholar]
- 55.Lystig TC. Adjusted p values for genome-wide scans. Genetics. 2003;164(4):1683–7. 10.1093/genetics/164.4.1683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Segura V, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012;44:825–30. 10.1038/ng.2314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Huang M, Liu X, Zhou Y, Summers RM, Zhang Z. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. Gigascience. 2019. 10.1093/gigascience/giy154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet. 2016;12:e1005767. 10.1371/journal.pgen.1005767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Wang J, Zhang Z. GAPIT version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinformatics. 2021;19:629–40. 10.1016/j.gpb.2021.08.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zhang Y, et al. Differentially regulated orthologs in sorghum and the subgenomes of maize. Plant Cell. 2017;29:1938–51. 10.1105/tpc.17.00354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Baldauf, J. A., Pitz M & Hochholdinger, F. RNA-Seq of IBM-RIL backcross hybrids. Bioproject PRJNA923128. Gene Expression Omnibus. Available at https://www.ncbi.nlm.nih.gov/bioproject/PRJNA923128/ (2023).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Vedder L. 2024. MaizeSNP Available at. 10.5281/ZENODO.10684044.
Supplementary Materials
Additional file 1: Supplement figures, Figures S1–S9.
Additional file 2: Table S1 Sample filtering: Details of retained and excluded samples. Table S2 MPH and BPH: Phenotypic mid-parent and better-parent heterosis values of each hybrid. Table S3 SPE pattern ref: SPE pattern of all genes in both reference hybrids B73xMo17 and Mo17xB73 and their eQTL. Table S5 eQTL details: Details on all identified eQTL. Table S7 TWAS genes: All genes and details identified in the TWAS analysis. Table S8 TSG: TWAS candidate genes with substantial numbers of SPE pattern across hybrids of (TSG) and corresponding details. Table S9 SNPs B73 vs. Mo17: List and details of identified loci (SNPs) with different alleles in B73 and Mo17 samples. Table S10 IBM-RIL regions: Specifications of start and end position of each identified Mo17, B73, or IBM-RIL specific region in each IBM-RIL.
Additional file 3: Supplement Material Supplement Material 1 and 2 with more detailed description of image analysis and initial SNP calling.
Additional file 4: Table S4 Proportion of heterotic variance explained by the number of SPE genes on better-parent heterosis for different root phenotypes. Table S6 Synteny of all active genes with eQTL.
Data Availability Statement
Sequence data used in this study have been deposited in NCBI Bioproject with the ID PRJNA923128 [61] and was previously described [24].