Skip to main content
G3: Genes | Genomes | Genetics logoLink to G3: Genes | Genomes | Genetics
. 2021 May 5;11(7):jkab153. doi: 10.1093/g3journal/jkab153

Back to the future: implications of genetic complexity for the structure of hybrid breeding programs

Frank Technow 1,, Dean Podlich 2, Mark Cooper 3
Editor: G P Morris
PMCID: PMC8495936  PMID: 33950172

Abstract

Commercial hybrid breeding operations can be described as decentralized networks of smaller, more or less isolated breeding programs. There is further a tendency for the disproportionate use of successful inbred lines for generating the next generation of recombinants, which has led to a series of significant bottlenecks, particularly in the history of the North American and European maize germplasm. Both the decentralization and the disproportionate contribution of inbred lines reduce effective population size and constrain the accessible genetic space. Under these conditions, long-term response to selection is not expected to be optimal under the classical infinitesimal model of quantitative genetics. In this study, we therefore aim to propose a rationale for the success of large breeding operations in the context of genetic complexity arising from the structure and properties of interactive genetic networks. For this, we use simulations based on the NK model of genetic architecture. We indeed found that constraining genetic space through program decentralization and disproportionate contribution of parental inbred lines, is required to expose additive genetic variation and thus facilitate heritable genetic gains under high levels of genetic complexity. These results introduce new insights into why the historically grown structure of hybrid breeding programs was successful in improving the yield potential of hybrid crops over the last century. We also hope that a renewed appreciation for “why things worked” in the past can guide the adoption of novel technologies and the design of future breeding strategies for navigating biological complexity.

Keywords: genetic complexity, hybrid breeding, breeding strategies, long-term genetic gain, historical properties

Introduction

Pioneered by Shull (1908), hybrid breeding is credited as one of the most significant factors for the tremendous productivity increases of major field (Duvick 1999) and horticultural (Silva Dias 2010) crops that enabled food production to keep pace with population growth. Hybrid breeding programs originally were centered around maximum exploitation of heterosis, a phenomenon that remains largely unexplained even after a century of research (East 1936; Lippman and Zamir 2007). This later evolved into the modern concept of hybrid breeding, characterized by its distinctive structuring of germplasm into heterotic groups and patterns (Melchinger and Gumber 1998). Beyond heterotic groups, the structure of commercial hybrid breeding, particularly in major crops like maize, is characterized by the largely isolated and unique sub-heterotic patterns of the major companies (Mikel 2006; Troyer 2009; White et al. 2020) as well as a high degree of decentralization into smaller, more or less disconnected sub-programs within those (Smith et al. 2006; Cooper et al. 2014). Plant breeders further have a tendency for relying on only a small set of elite inbred lines for producing the next generation of recombinants (Rasmusson and Phillips 1997), leading to a series of significant bottleneck events in the history of, for example, the North American maize germplasm (White et al. 2020). These characteristics drastically reduced the effective population size within breeding programs and are not predicted to be promising strategies under the additive, infinitesimal model of quantitative genetics (Gaynor et al. 2017). Nevertheless, the consistent long-term genetic gain has been demonstrated (Duvick et al. 2004).

To better describe and quantify the observed genetic variation among hybrids, the concept of general and specific combining ability was developed early on (Sprague and Tatum 1942). The former, commonly abbreviated as GCA, is a property of the additive effects of contributing genes and describes the average performance of all hybrids derived from an inbred. The latter, commonly abbreviated as SCA, is a nonadditive residual term that describes the deviation of the performance of a particular hybrid from the expectation based on the parental GCA values.

Running efficient hybrid breeding programs requires a preponderance of additive genetic variation to maximize response to selection in the next generation of inbred lines (Falconer and Mackay 1996) as well as the predictability of hybrid performance from the GCA of inbred lines (Reif et al. 2007). A preponderance of GCA variation also allows identification of inbreds that can serve as parents of several high-performing hybrids. This greatly simplifies production of commercial seed, which is a major challenge for many crops (Technow 2019). Therefore, hybrid breeding programs have traditionally relied on maximizing and exploiting GCA variation (Falconer and Mackay 1996; Melchinger 1999; Hallauer et al. 2010).

The historically grown paradigms around hybrid breeding designs and strategies are now being challenged by innovative concepts (e.g., Gaynor et al. 2017; Wallace et al. 2018; Hickey et al. 2019; Voss-Fels et al. 2019; Seye et al. 2020) devised in the wake of technological advances such as whole-genome prediction (Meuwissen et al. 2001), high-throughput phenotyping (Araus and Cairns 2014) and genotyping (Poland and Rife 2012), as well as gene editing (Jaganathan et al. 2018). While some of these concepts, in particular those aiming to “design” superior genotypes using gene editing or targeted recombination technology (Wallace et al. 2018; Brandariz and Bernardo 2019), as opposed to generating and evaluating natural variation in the genetic and environmental context in which it occurs, are highly speculative and might not live up to expectations (Bernardo 2016), it is clear that the next decades will change plant breeding. However, before implementing drastic changes to breeding programs we require a theoretical and simulation framework to explore and understand the structures and strategies that have contributed to the success of long-term genetic gain and germplasm improvement. From this historical basis, we can evaluate novel proposals and draw lessons for design of future breeding strategies.

Empirical reports show a preponderance of additive variation in many wild, domesticated, and laboratory species (Falconer and Mackay 1996; Lynch and Walsh 1998; Hill et al. 2008). This agrees well with published studies showing a preponderance of additive GCA over nonadditive SCA variation in hybrid breeding programs (Technow et al. 2014a; Larièpe et al. 2017). At the same time, however, advances in plant physiology and molecular and systems biology have stimulated a renewed appreciation of the intricate interactions at the molecular, metabolical, and physiological level that underlie complex traits (Carlborg and Haley 2004; Hammer et al. 2006; Phillips 2008; Saha et al. 2011; Wilkins et al. 2016; Jiang et al. 2017). Of particular relevance for hybrid breeding are recent studies indicating that heterosis is an emergent property of complex metabolic networks (Fiévet et al. 2010; Goff 2011; Fiévet et al. 2018; Vacher and Small 2019; Vasseur et al. 2019).

The paradox between the complexity of the underlying biology and the simplicity of the expressed variation can of course be resolved by distinguishing between biological and statistical effects and realizing that the former cannot be inferred from the latter (Wade 2002; Mackay 2014; Huang and Mackay 2016). Statistical effects of genes, as well as their aggregates such as GCA and SCA and their variances depend on the genetic background of the population in which they are evaluated, particularly on allele frequencies and linkage disequilibrium (LD) patterns (Falconer and Mackay 1996). For example, it was shown that, regardless of the underlying genetic architecture, genetic variances in random mating populations are expected to be predominantly additive when genes are at extreme frequencies and LD is high (Hill et al. 2008).

Thus, ratios of additive to nonadditive variation are not intrinsic properties of biological systems but at least partly a function of allele frequencies and LD patterns and thus dependent on breeding strategies (for sake of brevity, the reader is referred to the discussion section “Emergence of additive genetic variance” for a more in depth treatment of this topic). Because of the importance of additive variation for efficient operation of breeding programs, a framework to evaluate and study breeding strategies should allow for the possibility of additivity arising from high degrees of biological complexity at the genetic level.

In this study, we will use simulations based on the NK model of genetic complexity (Kauffman 1993) to explore two distinctive, historically grown characteristics of hybrid breeding: first its decentralization into smaller, more or less independent sub-programs, and second the disproportional use of superior inbred lines for producing the next generation of recombinants. Our goal thereby is not to make specific recommendations for optimal structuring of programs, but rather to gain an appreciation for the properties of these structures in the context of different degrees of genetic complexity.

Materials and methods

Model of genetic complexity

The NK model, introduced by Kauffman (1993) will form the basis of the simulations. The NK model allows generation of a tunable series of models of trait genetic architecture with increasing dimensionality and complexity by varying the number of genes N (dimensionality) and the degree of interaction among them (K, complexity). The NK framework is an application of graph theory to network interactions (Csaszar 2018). The motivation for considering the genetic architecture of traits as graphs through the NK framework is based on decades of research deciphering the networks related to plant growth, development, and adaptation and ultimately the generation of yield and other output traits in globally important crops such as maize (e.g., Studer et al. 2017), rice (e.g., Wilkins et al. 2016), sorghum (e.g., Li et al. 2018), soybean (e.g., Fang et al. 2017), and wheat (e.g., Harrington et al. 2020). The following section will illustrate this motivation using our first-hand experience with maize as an example.

Following the availability of the complete maize genome sequence and haplotype maps, we have studied the genome to phenome genetic architecture of many traits in historical and elite maize populations that are relevant to commercial maize breeding. One of the first examples of these was the genetic control of variation for flowering time which in maize relates to the transition of the shoot apical meristem from vegetative to reproductive floral (tassel) development (Dong et al. 2012).

This research generated many insights into the gene regulatory network underlying control of flowering time in maize. Dong et al. (2012) discuss examples of genes that were studied and tested for their combined influence on flowering time. An important investigation was also the detailed consideration of the gene ZMM28, a maize MADS-box transcription factor, found to have an important role in genetic variation for grain yield separate from its effect on flowering time (Wu et al. 2019). The discovery of the genes involved in the relationship of ZMM28 and grain yield was enabled by considering the genetic architecture of traits in terms of the gene networks underlying them.

Another experimental example of a gene network studied by a combination of mapping, transgenic, and editing approaches was based on the ARGOS gene family (Guo et al. 2014; Shi et al. 2015, 2017). The network of genes operating around the ARGOS genes are involved in regulating genetic variation for the ethylene responses of maize hybrids. These responses have important influences on genetic variation for yield that are independent of flowering time. The network of genes discovered in these studies further motivated our consideration of gene network models using the NK framework. Many other examples are described by Simmons et al. (2021).

The genetic landscape metaphor was introduced and developed by Wright (1932) to aid in conceptualizing genetic complexity in high dimensions (Svensson and Calsbeek 2013). As a metaphor it should not be taken literally but can help to gain an intuition for the complexity and ruggedness associated with increasing values of K (Kauffman 1993) as well as making the rather abstract concepts discussed henceforth more tangible. At K =1 (genes acting strictly additive without any inter- or intra-gene interactions), the genetic landscape can be imagined as that of Mount Fuji, i.e., a single, clearly distinguished peak with a steady and monotonous incline to the top (Figure 1). At intermediate K levels, the landscape is characterized by multiple peaks clustered together in a certain region of genetic space. This might be visualized as akin to the European Alps, i.e., a mountainous region within an otherwise flat landscape. Finally, at high value of K, the landscape resembles a sea of dunes, i.e., a range of peaks of similar height and shape distributed more or less evenly in space.

Figure 1.

Figure 1

Schematic visualization of genetic landscapes corresponding to different values of complexity parameter K.

Implementation of the NK model

We implemented the NK model according to the generalized approach described by Altenberg (1994), but adapted the model to accommodate diploid genomes. Here, the complex trait is described as a normalized sum of a set of “fitness components.” The normalization was done by dividing by N out of convention (Csaszar 2018). Because N was held constant throughout this study, normalization was by a constant and thus did not affect results (e.g., the relative ranking of individuals). The value of each fitness component is computed as a function of K interacting genes drawn at random from all N genes. Following Altenberg (1994), the specific fitness values were calculated with random functions derived from the ran4 pseudo-random number generator (Press et al. 1992) and are distributed uniformly between 0 and 1. For this study, the number of fitness components and the number of genes were both set to N =500. This number was chosen as a compromise between research showing that e.g., upwards of 1500 genes and upwards of 1800 metabolites are involved in the biomass metabolism of a complex organism such as maize (Saha et al. 2011) and computational tractability. Genes were biallelic and the simulated organism diploid. The complexity parameter K was varied from 1 to 15 in steps of 1 (i.e., creating genetic landscapes ranging in complexity from Mount Fuji to The Dunes; Figure 1). Following Altenberg (1994), we allowed for some variation in the number of genes interacting in each component. This was achieved by using K as the rate parameter in a Poisson distribution from which the number of interacting genes was drawn independently for each fitness component. The sampled values were then truncated to fall within a range of 1 and 15. The exception to this was at K =1, where all components were controlled strictly by a single gene. For K >1, genes were assigned at random to the fitness components, with the number of genes assigned to each component depends on the value drawn from the Poisson distribution, as described. Thus, genes typically influenced multiple fitness components (i.e., act pleiotropically). For K =1, each of the 500 genes was assigned to exactly one fitness component, and the values of heterozygous allele configurations were constrained to be midway between the homozygous configurations. Thus, K =1 represents a genetic architecture in which genes act strictly additive and without any inter-gene interactions. Using order statistics, the expected value of the maximum of two samples from a Uniform distribution between 0 and 1 is 2/3. Thus, the expected maximum attainable fitness at K =1 is 2/3.

Quantification of complexity

The complexity of the generated NK models was quantified following the “one-mutant neighbor” hill-climbing algorithm described by Kauffman (1993), but adapted to diploid organisms. A randomly generated genotype was used as the starting value. From there, all possible genotypes were generated that differ from the initial genotype by one allele at one of the 500 loci. Thus, a homozygous locus was changed to the heterozygous state while a heterozygous locus was changed to both alternate homozygotes. Then, the fitness values of all one-allele neighbors were evaluated according to the defined NK model, and an improved genotype was chosen at random from all fitter one-allele neighbors. This process was repeated until no fitter one-mutant neighbor could be found, meaning that the search reached a local or global optimum. For each level of K, 100 NK models were generated independently and a minimum of 65 searches, each starting at a random initial genotype, were conducted for each. The statistics recorded were the average number of steps until a local optimum was reached, the average Hamming genotypic distance, i.e., the proportion of different genome positions (Pinheiro et al. 2005), among optima and the correlation between the fitness values of the optima and the Hamming distance to the highest optimum identified (Kauffman 1993).

The average Hamming distance between the local peaks increased from just below 0.5 at K =2 to 2/3 at around K of 6 or 7 and remained constant at this value from there on (Figure 2A). Note that with three different genotypes at each locus, 2/3 is the expected value of the Hamming distance between randomly generated genotypes. Similarly, the correlation between the fitness values of the local peaks and their Hamming distance to the highest identified peak increased from −0.25 at K =2 to zero at K =9 (Figure 2B). Here, a negative correlation means that local peaks with higher fitness tend to be found near each other and clustered around the highest peak. Furthermore, a zero correlation indicates that there is no clustering of the peaks and proximity to the highest peak. Therefore, local peaks are randomly distributed throughout the genetic landscape. Thus, somewhere between K =6 and K =9, the landscape shifts from one in which local peaks tend to cluster together, to one where local peaks of arbitrary height can exist anywhere in genetic space. The average number of steps until a local peak was reached decreased with K from 500 at K =1 to just 167 at K =15 (Figure 2C). Note that 500 is the expectation at K =1 when starting from randomly generated genotypes, because 1/3 of the 500 loci are already at their highest possible value, 1/3 are one step removed (the heterozygous genotypes) and 1/3 are two steps removed (the lower homozygotes). Thus, the complexity and ruggedness of the genetic landscapes increase further after they become uncorrelated around K of 6 to 9.

Figure 2.

Figure 2

Relationship between the NK model complexity parameter K and (A) the average genetic Hamming distance between local peaks, (B) the average correlation between the fitness values of local peaks and their genetic Hamming distance to the highest peak and (C) the average number of steps to a local peak. Panels (A) and (B) omit results for K =1, for which only a single peak exists. The dotted lines in these panels indicate values of 2/3 and 0.0, respectively.

Genome definition

The simulated genome comprised 10 diploid chromosomes of 1 Morgan length each. Each of the chromosomes received a random subset of 50 of the 500 genes, which were distributed evenly across the chromosome. Recombination was simulated according to the Haldane mapping function with the R package “hybrid” (Technow 2013), in the version available from the supplement of Technow and Gerke (2017).

Simulation of the hybrid breeding process

The simulation process is visualized in Figure 3. The specific process depicted in this figure as well as the following explanations correspond to the distributed program structure of which more details will be given below. This structure is the most general of the ones considered and allows us to cover all features of the simulated process. The other structures considered, which are described in a later section as well, are special cases of this. The starting point of the simulation was a base population of inbred lines of size 1000 (Figure 3A). This population was simulated stochastically as described by Montana (2005) to result in an expected LD between two loci t Morgan apart equal to r2=0.5·2t/0.1 and with minor allele frequencies distributed uniformly between 0.35 and 0.50. The lines from the base population were then separated at random into two heterotic groups (arbitrarily labeled “1” and “2”) and further into sub-populations within those. The size of those sub-populations depended on the scenario. One sub-population from one heterotic group was then paired with one sub-population from the other group to form sub-heterotic patterns. These population pairs will henceforth be referred to as “breeding programs.” Hybrids were produced strictly across heterotic groups, by crossing lines from one sub-population of a program with lines from the other. Breeding crosses, i.e., crosses to generate a new generation of recombinant lines were done within and among sub-populations, depending on the scenario but strictly within heterotic groups (Figure 3B). The simulation of the breeding process described above is an approximation of the structure and evolution of long-term hybrid breeding akin to what we have observed in practice; i.e., starting from an initial germplasm base, separation into distinct heterotic groups and future separation into sub-populations (Duvick et al. 2004; Mikel and Dudley 2006).

Figure 3.

Figure 3

Schematic visualization of the simulated hybrid breeding process, exemplified for the distributed program structure. (A) Creation of initial population and random splits into heterotic groups and populations; (B) detailed view of the processes happening within each cycle; (C) relationship between the performance of parents and the proportion in which they were used in breeding crosses. The performance ranking of the programs reflects the average performance of the experimental hybrids they produced and is subject to change from cycle to cycle. The percentage values next to the arrows connecting programs indicate the percent of breeding crosses of a program that are conducted with lines from the respective higher ranked program. The total percent of such crosses for the lowest ranked program C is given by Pmax. The higher the rank of a program, the lower the percentage of lines from other programs used in its breeding crosses, with the highest ranked program A not using any.

Evaluation of genetic performance

The GCA of the lines was evaluated with an incomplete mating design (Melchinger et al. 1987; Seye et al. 2020) by performing 10 crosses per line with random partners from the opposite sub-population of the same program. The performance of the resulting hybrids, as determined according to the defined NK model, was then averaged. Finally, a normally distributed noise variable with zero mean and variance equal to one-half of the variance of the GCA values of that sub-population was added to those averages to represent residual experimental and environmental noise variation. The so obtained “observed” GCA values thus had a heritability, on a progeny mean basis, of 2/3, a value easily achievable in multi-environment, multi-testcross field trials (Schopp et al. 2015). Those GCA observations were then used to predict the performance of all possible inter-group hybrids of that program. The top hybrids, how many exactly depended on the scenario, were then selected and their true performance determined according to the NK model. The average of this select group of hybrids, which represents a set of advanced experimental hybrids, was used to quantify the overall performance of the program in the current cycle. The selected hybrids from all programs were then ranked according to their true performance and the value of the top-ranked hybrid was defined as the peak performance of the whole breeding operation in the current cycle and used as a metric of genetic gain. This metric reflects that commercial breeding programs release only a handful of hybrid products each cycle.

Selection of breeding parents

Breeding crosses among inbred lines for initiating the next recombination cycle were chosen by assigning each inbred line a usage probability, which was a product between an individual and population level relative contribution value. To determine the former, the lines within each sub-population were ranked according to their observed GCA values (Figure 3C). Only the top lines, how many depended on the scenario, were selected as potential parents, the remainder given an individual contribution value of zero. The relative contributions of the selected lines from a given sub-population were drawn from a Dirichlet distribution. The concentration parameters of this distribution were used to modulate the relationship between selection rank and relative contribution. Further details about this will be given later when describing the setting for the “parental contribution theme.”

The population-level contribution values describe the overall contribution of lines from one sub-population to the breeding crosses of another. They are thus defined anew for each target population and hence the contribution value of population “A” to the crosses for population “B” might be different than that to the crosses of population “C.” The process will be explained using the example visualized in Figure 3B. Here, there are three programs (labeled “A,” “B,” and “C,” with subscript 1 or 2 indicating the heterotic group).

In each cycle, the programs are ranked from highest to lowest performing according to the average performance of the selected set of experimental hybrids, as described above. This means that the relative performance ranking of the programs was subject to change from cycle to cycle. Germplasm, in the form of lines used as crossing partners, is exchanged only from higher to lower performing programs. Specifically, the number of crosses with lines from other programs increased from zero for the best performing program (A) to a proportion of Pmax for the lowest performing program (C), with intermediate programs staggered equidistantly between. In the example, Pmax =50%. Thus, program A will perform no crosses with lines from other programs, program B will use lines from other programs in 25% of its new crosses and program C in 50% of its crosses. How much of that overall proportion was derived from each of the other programs was proportional to the relative performance differences. In the example, the difference between program C and program A is twice as large as that between C and B, thus, lines from program A were used in twice as many crosses than lines from program B (33% from A and 17% from B for a total of 50%).

This process thus reflects that highly successful programs tend to exploit their own genetics while less successful programs have more of an incentive to explore superior genetics from other programs. The Pmax values considered in this study were 25, 50, and 75%.

The relative individual contributions were then multiplied with the relative population contributions to arrive at a final relative contribution value for each line to the crosses of a given population. The actual breeding crosses were then determined by sampling the lines with probabilities proportional to their contribution values. This was done with replacement, meaning that the same cross could have been made multiple times, but excluding crosses that would result in selfings. One recombinant line was derived from each crossing (e.g., two full-sib progenies would have been created if a crossing was made twice). This was done through seven generations of single-seed descent selfing, after which the lines were assumed to be fully homozygous and any residual heterozygosity was ignored. This new generation of recombinants fully replaced the previous generations, i.e., a line was considered as a crossing partner in only one generation. The so obtained new recombinants then form the next breeding cycle. The simulations were conducted for 30 cycles in total and repeated independently at least 500 times for each scenario studied. All computations were conducted in the R environment for statistical computing (R Core Team 2018).

Recorded metrics

In addition to the already described true performance of the best-identified hybrid, which was used as a measure of peak performance in a given cycle, several other measures were recorded to describe and understand the dynamics of the system.

The proportion of GCA to total genetic variance (%GCA) describes the amount of exploitable additive genetic variation currently available. It was estimated using the hybrids generated for evaluating the GCA of the inbred lines. For this, the following mixed model was fitted: hij=μ+gi+gj+eij, where hij was the true performance of the hybrid between line i from heterotic group 1 and line j from group 2, μ the overall mean, gi and gj were the GCA effects of lines i and j from the two heterotic groups and eij a residual term. Because the true genetic performances of the hybrids were used, eij approximates the SCA component. The model was fitted using the R package “lme4” (Bates et al. 2015) and %GCA then calculated as (Vgi+Vgj)/(Vgi+Vgj+Veij), where Vgi etc. were the estimated variance components. In scenarios with multiple programs, %GCA was estimated separately for each and then averaged to arrive at a single estimate for each cycle.

The modified Rogers’ distances (Reif et al. 2005) between the heterotic groups within each program were used as measures of heterotic group divergence. The distances were calculated for all programs and averaged to arrive at representative value for that cycle.

To describe the distribution of allele frequencies within each sub-population and hence the amount of available allelic diversity we calculated the proportion of loci with a minor allele frequency of less than 5%. This probability measures the thickness of the extreme tail of the allele frequency distribution and thus reflects the degree to which it follows a “U-shape” (Hill et al. 2008). This metric was evaluated for all sub-populations in each cycle and then averaged.

As a more high-level diversity metric, we considered the effective population size (Ne) of each sub-population. Ne was calculated according to the method described by Corbin et al. (2012) for estimating constant effective population size. The so obtained values were averaged across sub-populations.

Hybrid breeding “themes”

The term “theme” will be used to refer to a particular component of a holistic hybrid breeding strategy. The themes considered were the program structure and the relative parental contribution of successful inbred lines to the next generation. All previously described parameters, such as parameters related to the NK model and genetic architecture, parameters related to testcross evaluation, and so on, were kept constant across the themes investigated.

In the program structure theme, we explored consequences of separating hybrid breeding programs into smaller, more or less isolated, units. In addition to the previously described distributed structure, we considered two other, more extreme program structures for “searching” (Podlich and Cooper 1999) genetic space, which can be viewed as special cases of the distributed structure (Figure 4): one comprised a single, large program (centralized structure), the other structure comprised multiple smaller, but fully isolated programs (isolated structure).

Figure 4.

Figure 4

Schematic visualization of the three basic structures explored in the program structure theme: (A) centralized: a single program with one large population in each heterotic group; (B) distributed: multiple programs with smaller populations in each heterotic group that exchange germplasm; (C) isolated: multiple programs with smaller populations in each heterotic group that do not exchange germplasm.

The centralized structure was characterized by a single program consisting of one sub-population per heterotic group. The size of each was 500, for a total of 1000 lines generated in each cycle. The number of lines selected to contribute to the next generation was 125 per sub-population. The relative individual contributions of these lines decreased proportionally with their performance ranks. The number of selected experimental hybrids was 125. The isolated program structure comprised five programs, each with one sub-population per heterotic group. The sub-population size was 100 of which 25 were selected. Also here, the relative individual contributions of the lines were proportional to their performance ranks. The number of experimental hybrids selected per program was 25.

For the distributed structure, we considered three levels of Pmax: 25, 50, and 75%. The number of programs as well as lines and hybrids created and selected for each followed those of the isolated structure. Note, thus, that the total number of lines and hybrids were the same across all structures as was the selection intensity.

In the parental contribution theme, we explored the consequences of different degrees of imbalance in the relative contributions of the selected inbred lines to the next generation. Throughout, the program structure corresponded to the distributed structure with Pmax = 50%. Only the relative usage of inbred lines was varied. As described above, the observed relative contributions were drawn from a Dirichlet distribution with concentration parameter chosen in a way to result in a certain average relationship between relative contribution and performance rank. Three contribution scenarios were considered (Figure 5). In the balanced contribution scenario, all selected inbreds contributed equally on average, in the proportional scenario, the relative contribution declined proportionally with the performance rank of the lines. In the disproportional scenario, contributions halved with every 5 ranks, meaning that the highest performing line will contribute twice as much to the next generation as the 5th ranked line. The increasing imbalance in contributions can be quantified as 1/bb (with b being the vector of relative contributions), which is an estimate of the effective number of contributing lines (Boichard et al. 1997). For the balanced scenario, this was 25 and thus equal to the actual number of selected lines within each sub-population. It decreased to 19.1 and 13.6 for the proportional and disproportional scenarios, respectively.

Figure 5.

Figure 5

Scenarios for distribution of relative individual contributions of selected inbred lines considered in the parental contribution theme. (A) Balanced: selected parents contribute evenly to the next generation; (B) proportional: contribution of selected parents to next generation is proportional to their performance rank; and (C) disproportional: higher ranked parents contribute disproportionately more than lower ranked parents.

Data availability

The authors affirm that all data necessary for confirming the conclusions of this article are represented fully within the article and its figures and supplemental material. The R and C code for the simulations is provided as supplementary material. The supplementary material is available at figshare: https://doi.org/10.25387/g3.14036627.

Results

Program structure theme

Which structure achieved the highest peak performance depended on the value of the complexity parameter K, with the centralized structure being superior at low K <5, the distributed structure at intermediate K and the isolated structure at high values of K above 8 (Figure 6A). The differences between the structures tended to increase with increasing K. The centralized and distributed structures came very close to reaching the theoretical maximum peak performance under purely additive gene action (K =1), but the isolated structure remained considerably below that. Within the distributed structure, the highest Pmax value of 75% was superior at K-values below eight and the lowest Pmax of 25% at high K (Figure 6B). The case of Pmax = 50% had peak performance in between the two extremes, but more similar to Pmax = 75%. All Pmax scenarios achieved virtually identical peak performance at K =1.

Figure 6.

Figure 6

Relationship between the NK model complexity parameter K (average number of interacting genes) and peak genetic performance in the last cycle for the strategies explored in the program structure theme: (A) comparing the isolated, distributed, and centralized structures; (B) the different values of Pmax within the distributed structure. The curve of the distributed structure in (A) is an average across the three Pmax scenarios within it. Pmax gives the maximum percent of breeding crosses conducted with lines from other programs.

For brevity, trajectories across cycles are shown only for K-values of 1, 6, and 15, representing the additive, multi-peaked but clustered, and fully uncorrelated landscapes, respectively (Figure 7, A–L). Results for all values of K are available as supplemental information (Supplementary Figure S1). At K =1, the centralized program structure had the highest peak performance in all cycles, closely followed by the three versions of the distributed structure (Figure 7, A–C).

Figure 7.

Figure 7

Evolution of metrics over cycles in the program structure theme for scenarios with K of 1 (A, D, G, and J), 6 (B, E, H, and K), and 15 (C, F, I, and L).

The peak performance of the isolated structure was considerably lower than that of the other strategies as it increased at a lower rate and seemed to reach a plateau at around cycle 20. At K =6, the isolated structure achieved the highest peak performances in the earlier cycles but was overtaken by the distributed structures later. Those had very similar peak performances until the last few cycles when the version with Pmax of 25% fell behind. The centralized structure had the lowest peak performances throughout, with the differences to the other structures being particularly large between the intermediate cycles 15–20. Finally, at K =15, only the isolated structure had a sizable increase in peak performance cycle over cycle. The distributed structures showed an increase only in the last few cycles and the centralized structure did not increase peak performance at all.

As expected %GCA was equal to one for all scenarios at K =1 (Figure 7, D–F). At K =6, %GCA started at just below 10% and increased from there with each cycle. The rate of increase was greatest for the isolated structure which reached a %GCA of almost 100 in the final cycles. The centralized structure had the slowest increase and was still below 50% in the final cycle. The distributed structures were intermediate between these two extremes. The increase was steepest for Pmax = 25% case, which translated to it having a markedly higher %GCA than the Pmax = 50% case and Pmax = 75% case during intermediate cycles 15–20. However, all three converged to a similar value of around 80% in the final cycle.

At K =15, %GCA started at zero and only the isolated structure saw a marked increase in early cycles. The distributed program structure saw an increase in %GCA noticeably above zero only in the final cycles and the centralized structure remained at zero throughout.

The percent of loci with MAF < 0.05 increased over cycles for all program structures and complexity levels (Figure 7, G–I). In all cases, the increase over cycles was strongest for the isolated structure, where it reached close to 100% in the final cycles and weakest in the centralized structure. The curves for the three Pmax levels of the distributed structure were similar to each other and intermediate compared to the two other structures. The differences between the structures increased with K because the increase in the proportion of loci at extreme frequencies slowed for the distributed and centralized structures with increasing K. At the highest levels of complexity, only between 30 and 40% of loci showed a MAF < 0.05 in the different distributed structures and less than 20% in the centralized structure.

The modified Rogers Distance between heterotic groups increased over cycles for all structures (Figure 7, J–L). For all levels of complexity, this distance was highest for the isolated structure and lowest for the centralized structure, with the three versions of the distributed structure having similar values that were intermediate to the two extremes (Figure 8).

Figure 8.

Figure 8

Modified Rogers Distance between heterotic groups as a function of K, averaged across programs, in the program structure theme. The curve of the distributed structure represents the average across the three Pmax levels.

The Ne differences among the structures remained largely constant across cycles and levels of K. For the sake of brevity results will only be reported for cycle 15 and K =7. The estimated Ne for each-sub-population for the isolated structure was 20.0, for the three versions of the distributed structure it was 23.7 (Pmax = 25%), 31.3 (Pmax = 50%), and 35.4 (Pmax = 75%), respectively, and for the centralized structure 98.3.

Parental contribution theme

Which parental contribution scenario achieved the highest peak performance depended on the complexity level K (Figure 9). At the additive case of K =1, all contribution scenarios achieved very similar peak performances close to the theoretical maximum of 2/3. Until K =8, the highest peak performances were reached with proportional contribution of the selected inbred lines. For K >8, disproportional contribution of inbred lines resulted in the highest peak performances. Balanced contribution generally resulted in the lowest peak performance, except for K <4, where it was slightly ahead of the disproportional contribution option. The differences between the contribution scenarios tended to increase with K.

Figure 9.

Figure 9

Relationship between the NK model complexity parameter K and peak genetic performance in the last cycle for the scenarios explored in the parental contribution theme.

For brevity sake, we also here show the trajectories across cycles only for K of 1, 6, and 15 (Figure 10, A–L). Results for all values of K are provided in Supplementary Figure S2. The cycle over cycle increase in peak performance was initially higher the more disproportional the contribution of the inbreds lines (Figure 10, A–C). However, except for the highest level of complexity, this did not result in the highest maximum performance for this scenario, because the increase started to level off in the last 5 to 10 cycles. Scenarios with proportional and balanced contribution, therefore, had the highest peak performance at K =1, though the differences were small. At the intermediate level of K =6, the disproportional contribution scenario was overtaken by the proportional contribution scenario in the last cycles. The differences between these two were small, however. Finally, at K =15, only the disproportional contribution scenario achieved a sizable increase in peak performance.

Figure 10.

Figure 10

Evolution of metrics over cycles in the parental contribution theme for scenarios with K of 1 (A, D, G, and J), 6 (B, E, H, and K), and 15 (C, F, I, and L).

At K =1, %GCA stayed constant at one for all scenarios, as expected. At K =6, %GCA increased most strongly for disproportional contribution, followed by proportional and balanced contribution (Figure 10, D–F). Reaching above 90% for the former, and above 80 and 50% for the latter two, respectively. At K =15, %GCA remained near zero for the balanced and proportional contribution scenarios throughout. For the disproportional contribution scenario, it remained at zero as well until cycle ten and increased from there to almost 60%.

For all strategies and values of K, the percent of loci with a MAF < 0.05 increased from its initial value of zero (Figure 10, G–I). The increase over cycles was strongest for disproportional parental contribution, for which it reached close to 100% at K of 1 and 6. The proportional contribution scenario had the second strongest increase and the balanced scenario the weakest. As was the case in the program structure theme, the differences between the scenarios tended to increase with K. At the highest level of K =15, the proportional and balanced scenarios stayed below 40%, whereas the disproportional contribution scenarios reached close to 80%.

The modified Rogers distance between heterotic groups increased over cycles in all scenarios (Figure 10, J–L). Throughout it was highest for disproportional contribution, followed by proportional and balanced contribution. Overall, the distance was greatest for the intermediate complexity level of K =6.

Ne, again reported only for cycle 15 at K =7, was 23.2, 31.3, and 44.6, for the disproportional, proportional, and balanced parental contribution scenarios, respectively.

Discussion

The objective of this study was to explore properties of historically grown commercial hybrid breeding programs, particularly in maize, and aid the understanding of why they successfully generated significant amounts of genetic gain in the past and thereby impacted global food security. The infinitesimal framework (Barton et al. 2017), in which traits are described as the sum of a large number of genes all having additive, context-independent effects of similar magnitude, is our starting point. However, we seek extensions to account for the empirical observations that (a) there are results observed from operating a long-term breeding effort that is not consistent with or not easily explained within the infinitesimal model framework (Rasmusson and Phillips 1997) and (b) reflect the reality of a highly complex trait biology (Hammer et al. 2006). Therefore, we are motivated to consider the influence of complexity of trait genetic architecture on breeding strategies from the perspective of a long-term commercial breeding program (Duvick et al. 2004; Cooper et al. 2014).

Emergence of additive genetic variance

As a representation of genetic complexity, we chose the NK model framework developed by Kauffman (1993), which allows exploration of the full continuum from complete additivity to deep and almost intractable genetic complexity. For reference, the NK models used in this study, corresponding to the Mount Fuji landscape at K =1 (Figure 1) and to the “Alps” landscape from K =2 to K of 7 or 8. After this, the genetic models transitioned from the multi-peaked but correlated “Alps” landscape to the uncorrelated landscape represented by the “Dunes” metaphor (Figures 1 and 2). In complex genetic landscapes, additive genetic variance, the sine qua non of genetic gain, is not a constant factor of trait biology (i.e., deducible from the molecular properties of genes) but rather emerging from the interplay of biology and natural or artificial properties of population structure (Wade 2002; Cooper et al. 2005). Statistical additivity is thus conditional on the sample of allele contrasts that can be observed in a breeding population and the properties of the underlying trait biology (as represented by the NK network here). In particular, additive genetic variance emerges in response to a constraining of the dimensionality of genetic space, or, in other words, by limiting genetic diversity. In practice, such constraints in dimensionality are achieved through fixation or near fixation of genes (Wade 2002; Hill et al. 2008). This process is illustrated in Figure 11 for a simple digenic epistatic network (Holland 2001) that corresponds to a NK model with N = K =2 and two distinct fitness peaks. Thus, as genetic complexity increases, the breeder needs practical ways to reduce this complexity to a manageable level that allows genetic progress. This study explored two practical approaches that have been adopted within commercial hybrid breeding, particularly in maize. With the availability of genomics and novel thinking about genetic complexity, we can now study the genetic implications of these practical approaches, many of which were devised and adopted prior to the availability of a theoretical and empirical framework to study their effects.

Figure 11.

Figure 11

Illustration of a two-locus epistatic network corresponding to a NK model with N = K =2 and two fitness peaks (panels C and D). Neither the A nor the B locus exhibit any additive variation when all alleles have allele frequency of 0.5 (panels A and C). When the B locus becomes fixed for the B1 allele while the allele frequencies at the A locus remain at 0.5, however, the network is reduced to an additive system in which the substitution effect of the A locus explains 100% of the variation (B and D). Note that the substitution effect of the A locus would be reversed in sign when the B2 allele became fixed instead. Colors blue, yellow, and red represent high, intermediate, and low-phenotypic values, respectively.

Two processes, in particular, accelerate such constrainment: namely the creation of population bottlenecks and the subdivision of larger populations into more or less independent “demes” (Katz and Young 1975; Goodnight 1995). Equivalent processes in the context of plant breeding programs are the degree of connectivity between breeding programs and the relative contribution of superior inbred line parents in breeding crosses for producing the next generation, both “themes” were explored in this study.

Program structure theme

Classical quantitative genetics infinitesimal theory was used to design and optimize commercial hybrid breeding programs, in combination with empirical experience of what worked and what did not (Hallauer et al. 2010). Yet, even though the infinitesimal model implies optimality of a single, homogenous population, there were discussions about the relative merits of large centralized vs decentralized breeding programs early on (Baker and Curnow 1969). Later on, Podlich and Cooper (1999) explored this problem on the basis of Sewall Wright’s shifting balance theory, (Wright 1931, 1977; Wade and Goodnight 1998). The shifting balance theory describes an evolutionary process in which genetic drift resulting from population subdivision enables random movements across genetic space (i.e., against selection gradients) and also converts epistatic to additive genetic variation through constraining genetic space as described above. This then enables local adaptation in complex genetic landscapes which is followed by differential migration from higher to lower performing sub-populations and thus “spreading” of superior gene complexes across the whole meta population.

While this theory remains controversial as a model of natural evolution (Coyne et al. 1997), there are remarkable similarities between meta populations in the context of the shifting balance theory and the population structure of large commercial breeding operations. The latter also do not operate as a centralized unit but rather as a decentralized network of smaller programs with the most successful germplasm being shared across (Cooper et al. 2014). The same seems to be the case at the industry level, with the major commercial breeding operations being based on unique and distinct germplasm backgrounds, with only occasional exchange of elite material, e.g., through ex-PVP lines (Mikel 2006; White et al. 2020). As a result of this decentralization, plant breeding programs are also characterized by having low effective population sizes (Cowling 2013), which makes them more susceptible to genetic drift.

Here, we expanded on the work of Baker and Curnow (1969) and Podlich and Cooper (1999) by exploring breeding program structures with different levels of decentralization (Figure 4), ranging from a large centralized program with high Ne to an isolated set of smaller programs with low Ne, with a series of scenarios with decentralized but connected programs with Ne values in between these two extremes. We indeed found that strategies resulting in low within program Ne through increased decentralization and isolation became increasingly superior in terms of peak hybrid performance, as genetic complexity K increased, while a centralized program structure with high Ne was superior in less complex landscapes. These results thus confirm the findings of Podlich and Cooper (1999) that a distributed, decentralized program structure is superior in complex genetic landscapes. Increasing isolation and decentralization and the associated Ne reduction led to quicker increases over cycles and higher overall values of %GCA (Figure 7). At the highest levels of complexity, only complete isolation generated amounts of GCA variation sufficient for making genetic improvements cycle over cycle. The better ability to expose additivity in the form of GCA variation of the more isolated and decentralized structures was expected as per the discussion at the beginning of this section outlining the relationship between amounts of additive variation and constrainment of genetic space. This explains the clear advantages in terms of genetic peak performance of the isolated program structure at the highest levels of K.

The corollary of the constrainment of genetic space of course is a more rapid decline in genetic diversity and susceptibility to genetic drift, which ultimately limits the selection potential of the programs. Indeed, for values of K below eight, which marked the switch from an uncorrelated to multi-peaked but correlated genetic landscape (Figure 2), decentralized programs with increasing rates of germplasm exchange became superior. Accordingly, having a large centralized program became optimal at lower values of K. Here, the genetic landscape was simple enough to not require severe constrainment of genetic space to expose sufficient amounts of GCA variation. The genetic drift experienced by small, isolated programs then unnecessarily led to the fixation of unfavorable alleles. This was most apparent at K =1 where all variation is additive and a decentralized structure is not expected to have any advantage (Rathie and Nicholas 1980). Here, the isolated structure experienced a fixation of almost all loci from cycle 20 onward and a stalling of genetic gain significantly below the theoretically achievable maximum (Figure 7).

An alternative hypothesis for the advantage of decentralized commercial breeding program structures can be made assuming that the selected genetic combinations associated with locally additive landscape peaks discovered by individual breeding programs are transferable to a globally additive landscape peak. Testing this alternative hypothesis would require investigation of the global structure of NK adaptation landscapes and their features that persist following evolution, domestication, and breeding in an agricultural context. This is an open area for further investigation applying the NK framework in combination with the growing body of empirical evidence generated from extensive pangenome to panphenome studies for the traits contributing to crop adaptation and performance for historical genotype-environment trajectories. We expand on the importance of genotype by environment interactions below and consider extensions of the NK framework.

The establishment of genetically divergent heterotic groups has always been a central tenant of hybrid breeding (Melchinger and Gumber 1998). Originally, optimal exploitation of heterosis was the main driver of their establishment (East 1936). Later, however, maximization of GCA vs SCA variation was identified as an import secondary feature of heterotic groups (Melchinger and Gumber 1998). While this is well established for dominant gene action (Reif et al. 2007; Fischer et al. 2009), there are also indications for the conservation of favorable epistatic patterns that are disrupted when lines from different heterotic groups are recombined (Bernardo 2001). Often, heterotic groups are established from populations that evolved in isolation for a long time. One of the best examples for this is the Dent by Flint heterotic pattern in maize which is prevalent in Central Europe and is comprised of populations that evolved in separation for centuries (Rebourg et al. 2003). Heterotic groups are thus a different and additional form of constrainment of genetic space through historically grown genetic isolation. In our simulations, the different heterotic groups were originated from the same base population, yet we still observed a significant degree of genetic differentiation evolve over cycles (Figure 7), as expected in recurrent, reciprocal selection regimes (Labate et al. 1999; Longin et al. 2013). A portion of this differentiation can be attributed to genetic drift (Gerke et al. 2015), as evidenced by the nonzero genetic distance at K =1, where all effects are additive and increasing genetic differentiation between heterotic groups would have no effect on the proportion of GCA variance. However, the genetic differentiation was considerably higher for K >1 (Figure 8), indicating that there indeed was a selection advantage to increased heterotic group divergence in complex genetic landscapes.

This was particularly clear for the isolated program structure, where heterotic patterns could form uninterrupted within programs. For the centralized and decentralized structures, the differentiation was maximal at lower values of K, because %GCA, and hence the effectiveness of recurrent, reciprocal selection, declined afterward.

Parental contribution theme

The history of North American and European maize germplasm can be described as a succession of key inbreds that were heavily used in breeding crosses and had a distinct and lasting impact on germplasm (Mikel and Dudley 2006; Technow et al. 2014b; White et al. 2020). Those inbreds owe their success either to the outstanding general combining ability relative to their peers at the time, such as was case for the important North American line B73 (Mikel and Dudley 2006) or their unique adaptation to specific climatic conditions, such as the European Flint lines F2 and F7 (Messmer et al. 1992; Böhm et al. 2014). The highly disproportionate importance of successful inbreds led to a significant reduction in genetic diversity (Rasmusson and Phillips 1997; White et al. 2020), particularly relative to the source populations from which they were derived (Böhm et al. 2017). However, this constrainment also might be responsible for the emergence of additive genetic variation from complex gene action through the so-called founder or bottleneck effect (Goodnight 1988; Cheverud and Routman 1996; Naciri-Graven and Goudet 2003; van Heerwaarden et al. 2008). We indeed observed that %GCA increased faster over cycles and reached higher values overall the more uneven the contribution of selected parents to breeding crosses (Figure 10), with the exception of K =1, where all variance is additive by definition. At the highest degrees of landscape complexity only disproportionate contribution of parental inbred lines, resulting in very low Ne, succeeded in generating amounts of %GCA sufficient for genetic improvements. Like in the program structure theme, the higher values of %GCA of the disproportional contribution scenario translated into superior peak performances only at the values K >8, i.e., after the landscape transitioned from multi-peaked but correlated to uncorrelated. Before that, balanced and particularly proportional contribution, both having higher Ne, achieved superior peak performances.

Maintenance of diversity

In our simulations, the constrainment of genetic space and reduction of Ne through decentralization and isolation or disproportionate contribution of inbred lines, while necessary for exposing additive genetic variation, led to a rapid fixation of alleles and a slowing of genetic gain in later cycles. This was partly a consequence of genetic drift caused by the low Ne (Cowling 2013). However, the reduction of Ne was also caused in part by the effects of selection, particularly once the majority of the genetic variation was additive. This has not generally happened in commercial breeding programs, where genetic gain continues apace (Rasmusson and Phillips 1997; Duvick et al. 2004; Fischer et al. 2008; Pfeiffer et al. 2019). Several factors that maintain diversity in practical programs were not included in the simulation model. For example, the simulation implicitly assumed that the environment and management conditions remained constant across all cycles, whereas both change more or less rapidly in reality. Changing selection environments imply changing selection targets and trajectories (Hammer et al. 2009; Messina et al. 2011), which reduce the pressure on particular alleles or allele complexes and thus slow or prevent fixation. Long-term selection experiments have shown that selection response can be maintained even in isolated and narrow populations (Dudley and Lambert 2010; Durand et al. 2010, 2015). Several hypotheses were proposed for these surprising results, including epistasis (Carlborg et al. 2006), de novo genetic mutations, particularly when magnified through effects on epistatic complexes (Rasmusson and Phillips 1997; Durand et al. 2010), creation of heritable epigenetic variation (Hauben et al. 2009), activity of transposable elements (Dubin et al. 2018), as well as the presence of “cryptic genetic variation” through phenomena such as canalization (Gibson and Dworkin 2004). Of these, only epistasis was present in our simulations. While highly speculative, these biological phenomena might explain the presence of abundant genetic variation and continued genetic gain in largely isolated and genetically narrow commercial plant breeding programs.

Genetic gain, however, is not just a function of additive genetic variance but also of the intensity and accuracy of selection (Falconer and Mackay 1996). Increases in the latter two, can partially or fully compensate for decreases in additive genetic variance. In the present study, resources and thus selection intensity and accuracy were kept constant across cycles. This is in contrast to what happened in commercial plant breeding operations, where investments have increased considerably over the last few decades (Duvick and Cassman 1999; Kusmec et al. 2021). Thus, in addition to de novo creation of genetic variation and changing environments, the compensating effect of increased investments can partially explain why no significant decrease in the rate of genetic gain was observed in commercial plant breeding operations.

Applications of the NK model for plant breeding

The NK model of biological complexity, developed by the theoretical biologist Stuart Kauffman, continues to be in wide use for studying problems in theoretical biology and evolutionary genetics (e.g., Hayashi et al. 2006; Aita et al. 2007; Rowe et al. 2010; Franke et al. 2011; Østman et al. 2012; Schmiegelt and Krug 2014; Nahum et al. 2015; Weinreich et al. 2018). Recent empirical extensions to the infinitesimal model of quantitative genetics also seem to validate key tenents of the NK model. One example of this is the “omnigenic” model (Boyle et al. 2017a, 2017b), which, based on insights from genome-wide association studies and systems genetics, postulates that complex traits are influenced by virtually all genes in a genome through highly interactive regulatory networks. The NK model has also been proposed as an abstraction for explicit gene to phenotype models, such as crop growth models (Messina et al. 2011; Makumburage et al. 2013; Cooper et al. 2021). Outside of biology, the NK model has been recognized as a generic model for complex systems and found applications in disparate fields such as business administration (Csaszar 2018), organizational learning theory (Lazer and Friedman 2007), infrastructure design (Grove and Baumann 2012), and physics (Qu et al. 2002). Following the example of Podlich and Cooper (1999), we here used the NK model to represent genetic complexity navigated by commercial hybrid breeding operations to study the effect of the degree of isolation between programs as well as the degree of imbalance in parental contribution, both key aspects of breeding strategies. We propose that this model can serve as an ideal starting point to study other aspects of hybrid breeding strategies. For example, Cooper and Podlich (2002) proposed an extension to the NK model that adds an environmental dimension and thus allows modeling concepts related to genotype by environment interaction (Cooper and DeLacy 1994), yield stability (Piepho 1998; Tollenaar and Lee 2002), product placement (Messina et al. 2018), and the target population of environments (Comstock 1977). These so-called E(NK) models represent different environments or management practices through a series of more or less similar genetic landscapes. This of course adds considerable complexity to the already complex static landscapes studied here and poses interesting dilemmas. For example, rapidly exposing additive variation, e.g., through isolation, might be even more important than in static landscapes because local optima have to be exploited quickly before they disappear once the environment shifts, for example, through changes in management such as the historical increases in plant population for commercial maize production (Hammer et al. 2009). However, retaining allelic diversity, which hampers the exposing of additivity, is required to enable renewed adaptation to the changed environmental conditions. We here showed that a decentralized program structure can be superior over a centralized structure even if only a single environment type is targeted. We hypothize that the benefit of decentralization increases even further when mulitple distinct environment types have to be targeted simultaneously and require local adaptation (Smith et al. 2006). The E(NK) framework is well suited to investigate the implications of a simultaneous presence of genetic and environmental complexity on breeding program structure design.

We noted that genetically distinct heterotic groups are an additional means by which genetic space is constrained. A simulation framework based on the NK model could be used to compare the efficacy of different strategies of identifying and maintaining heterotic patterns, particularly for crops which are still in the early stages of heterotic group formation (Melchinger and Gumber 1998).

A high degree of genetic complexity also implies a high degree of context dependency of genetic effects. Observe, for example, that the additive substitution effect of the “A” locus in Figure 11 changes sign when the B1 allele instead of the B2 allele becomes rare. This has consequences on the persistence of accuracy of estimated QTL effects and genomic prediction models and can be addressed through iteratively updating training populations for genetic model parameterization (Podlich et al. 2004; Wolc et al. 2016; Forneris et al. 2017). The NK model framework can help address questions about the frequency with which this has to happen and whether data from previous generations can be used. Recently, approaches were proposed that attempt to capture those context dependencies through biological models representing the interdependencies underlying the traits of interest (Technow et al. 2015). Such models are only approximations of the true biological complexity. However, Cooper et al. (2005), using the NK framework, have shown that even incomplete knowledge of biological networks can improve predictability of genetic effects and genetic gain. The context dependency of genetic effects, i.e., the effects being neither universally positive or negative (Wade 2002), also has implications on innovative proposals for using CRISPR-Cas9 gene editing (Jaganathan et al. 2018; Gao et al. 2020) to either target recombination to create superior hypothetical linkage groups (Brandariz and Bernardo 2019) or even the large scale “editing away” of deleterious mutations (Wallace et al. 2018). Finally, this framework might help devise strategies for the efficient introduction of exotic or ancient germplasm (Yu et al. 2016; Böhm et al. 2017), which evolved not just in a very different environment, but also a different genetic context from the current elite breeding germplasm.

Back to the future

The structure of commercial plant breeding programs, particularly in major crops like maize, is characterized by a large degree of decentralization with exchange of successful germplasm within companies (Cooper et al. 2014), while isolation is the norm among companies (Mikel and Dudley 2006). Plant breeders further have a tendency for relying on only a small set of elite inbred lines for producing the next generation (Rasmusson and Phillips 1997), leading to a series of significant bottleneck events (White et al. 2020). All of these features lead to a drastically reduced effective population size and are not predicted to be promising strategies under the additive, infinitesimal model of quantitative genetics. Yet commercial hybrid breeding has delivered incredible amounts of genetic gain over the last century, and has thus contributed to food security and resource conservation (Duvick 1999). Here, we postulated that the described structure of plant breeding programs, with its constrainment of genetic space, is, in fact, necessary for enabling the exploration and exploitation of genetic variation in complex genetic landscapes and that the success of a breeding program is not only determined by its germplasm per se, but by the structures that allow it to evolve. The breeding program structure described here grew historically (Crabb 1947) and we do not claim that it was set up with this intention. However, by doing “what worked,” breeders in preceding generations might have nonetheless been able to take advantage of the process described and postulated in this study. Understanding why these historic structures “worked” will be critical for designing breeding programs that can tackle the challenges of the century ahead.

Funding

Frank Technow and Dean Podlich were employed by the company Corteva Agriscience. Mark Cooper was supported by the Australian Research Council Centre of Excellence for Plant Success in Nature and Agriculture (CE200100015).

Conflicts of interest

None declared.

Literature cited

  1. Aita T, Hayashi Y, Toyota H, Husimi Y, Urabe I, et al. 2007. Extracting characteristic properties of fitness landscape from in vitro molecular evolution: a case study on infectivity of FD phage to E. coli. J Theor Biol. 246:538–550. [DOI] [PubMed] [Google Scholar]
  2. Altenberg L. 1994. Evolving better representations through selective genome growth. Proceedings of the First IEEE Conference on Evolutionary Computation IEEE World Congress on Computational Intelligence ICEC-94, New York: IEEE. pp. 182–187.
  3. Araus JL, Cairns JE.. 2014. Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci. 19:52–61. [DOI] [PubMed] [Google Scholar]
  4. Baker LH, Curnow RN.. 1969. Choice of population size and use of variation between replicate populations in plant breeding selection programs 1. Crop Sci. 9:555–560. [Google Scholar]
  5. Barton NH, Etheridge AM, Véber A.. 2017. The infinitesimal model: definition, derivation, and implications. Theor Pop Biol. 118:50–73. [DOI] [PubMed] [Google Scholar]
  6. Bates D, Mächler M, Bolker B, Walker S.. 2015. Fitting linear mixed-effects models using lme4. J Stat Soft. 67:1–48. [Google Scholar]
  7. Bernardo R. 2001. Breeding potential of intra- and interheterotic group crosses in maize. Crop Sci. 41:68–71. [Google Scholar]
  8. Bernardo R. 2016. Bandwagons I, too, have known. Theor Appl Genet. 129:2323–2332. [DOI] [PubMed] [Google Scholar]
  9. Böhm J, Schipprack W, Mirdita V, Utz HF, Melchinger AE.. 2014. Breeding potential of European flint maize landraces evaluated by their testcross performance. Crop Sci. 54:1665–1672. [Google Scholar]
  10. Böhm J, Schipprack W, Utz HF, Melchinger AE.. 2017. Tapping the genetic diversity of landraces in allogamous crops with doubled haploid lines: a case study from European flint maize. Theor Appl Genet. 130:861–873. [DOI] [PubMed] [Google Scholar]
  11. Boichard D, Maignel L, Verrier É.. 1997. The value of using probabilities of gene origin to measure genetic variability in a population. Genet Sel Evol. 29:5–23. [Google Scholar]
  12. Boyle EA, Li YI, Pritchard JK.. 2017a. An expanded view of complex traits: from polygenic to omnigenic. Cell. 169:1177–1186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Boyle EA, Li YI, Pritchard JK.. 2017b. The omnigenic model: response from the authors. J Psychiatry Brain Sci. 2:S8. [Google Scholar]
  14. Brandariz SP, Bernardo R.. 2019. Predicted genetic gains from targeted recombination in elite biparental maize populations. Plant Genom. 12:180062. [DOI] [PubMed] [Google Scholar]
  15. Carlborg O, Haley CS.. 2004. Epistasis: too often neglected in complex trait studies? Nat Rev Genet. 5:618–625. [DOI] [PubMed] [Google Scholar]
  16. Carlborg O, Jacobsson L, Åhgren P, Siegel P, Andersson L.. 2006. Epistasis and the release of genetic variation during long-term selection. Nat Genet. 38:418–420. [DOI] [PubMed] [Google Scholar]
  17. Cheverud JM, Routman EJ.. 1996. Epistasis as a source of increased additive genetic variance at population bottlenecks. Evolution. 50:1042–1051. [DOI] [PubMed] [Google Scholar]
  18. Comstock RE. 1977. Quantitative genetics and the design of breeding programs. In: Edward Pollack, Oscar Kempthorne, Bailey, Theodore B. Bailey, (editors). Proceedings of the International Conference on Quantitative Genetics. Ames, IA: Iowa State University Press. p. 16–21. [Google Scholar]
  19. Cooper M, DeLacy IH.. 1994. Relationships among analytical methods used to study genotypic variation and genotype-by-environment interaction in plant breeding multi-environment experiments. Theor Appl Genet. 88:561–572. [DOI] [PubMed] [Google Scholar]
  20. Cooper M, Messina CD, Podlich D, Totir LR, Baumgarten A, et al. 2014. Predicting the future of plant breeding: complementing empirical evaluation with genetic prediction. Crop Pasture Sci. 65:311. [Google Scholar]
  21. Cooper M, Podlich DW.. 2002. The E(NK) model: extending the NK model to incorporate gene-by-environment interactions and epistasis for diploid genomes. Complexity. 7:31–47. [Google Scholar]
  22. Cooper M, Podlich DW, Smith OS.. 2005. Gene-to-phenotype models and complex trait genetics. Aust J Agric Res. 56:895–918. [Google Scholar]
  23. Cooper M, Powell O, Voss-Fels KP, Messina CD, Gho C, et al. 2021. Modelling selection response in plant breeding programs using crop models as mechanistic gene-to-phenotype (CGM-G2P) multi-trait link functions. In Silico Plant. 3:1–21. [Google Scholar]
  24. Corbin LJ, Liu AYH, Bishop SC, Woolliams JA.. 2012. Estimation of historical effective population size using linkage disequilibria with marker data. J Anim Breed Genet. 129:257–270. [DOI] [PubMed] [Google Scholar]
  25. Cowling WA. 2013. Sustainable plant breeding. Plant Breed. 132:1–9. [Google Scholar]
  26. Coyne JA, Barton NH, Turelli M.. 1997. Perspective: a critique of Sewall Wright’s shifting balance theory of evolution. Evolution. 51:643–671. [DOI] [PubMed] [Google Scholar]
  27. Crabb AR. 1947. The Hybrid-Corn Makers: Prophets of Plenty. New Brunswick: Rutgers University Press. [Google Scholar]
  28. Csaszar FA. 2018. A note on how NK landscapes work. J Org Design. 7:15 [Google Scholar]
  29. Dong Z, Danilevskaya O, Abadie T, Messina C, Coles N, et al. 2012. A gene regulatory network model for floral transition of the shoot apex in maize and its dynamic modeling. PLoS One. 7:e43450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Dubin MJ, Mittelsten Scheid O, Becker C.. 2018. Transposons: a blessing curse. Curr Opin Plant Biol. 42:23–29. [DOI] [PubMed] [Google Scholar]
  31. Dudley JW, Lambert RJ.. 2010. 100 Generations of Selection for Oil and Protein in Corn. In: Jules Janick, (editor). Plant Breeding Reviews. Hoboken: Ltd: John Wiley & Sons. p. 79–110. [Google Scholar]
  32. Durand E, Tenaillon MI, Raffoux X, Thépot S, Falque M, et al. 2015. Dearth of polymorphism associated with a sustained response to selection for flowering time in maize. BMC Evol Biol. 15:103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Durand E, Tenaillon MI, Ridel C, Coubriche D, Jamin P, et al. 2010. Standing variation and new mutations both contribute to a fast response to selection for flowering time in maize inbreds. BMC Evol Biol. 10:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Duvick D. 1999. Heterosis: feeding people and protecting natural resources. In: Coors J, Pandey S, editors. In the Genetics and Exploitation of Heterosis in Crops. Madison, WI: CSSA. p. 19–29. [Google Scholar]
  35. Duvick D, Smith J, Cooper M, 2004. Long-term selection in a commercial hybrid maize breeding program. In: Janick J, editor. Plant Breeding Reviews. Hoboken, NJ: John Wiley & Sons, Inc, p. 109–152. [Google Scholar]
  36. Duvick DN, Cassman KG.. 1999. Post-green revolution trends in yield potential of temperate maize in the north-central United States. Crop Sci. 39:1622–1630. [Google Scholar]
  37. East EM. 1936. Heterosis. Genetics. 21:375–397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Falconer DS, Mackay TFC.. 1996. Introduction to Quantitative Genetics. London: Pearson. [Google Scholar]
  39. Fang C, Ma Y, Wu S, Liu Z, Wang Z, et al. 2017. Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biol. 18:161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Fiévet JB, Dillmann C, de Vienne D.. 2010. Systemic properties of metabolic networks lead to an epistasis-based model for heterosis. Theor Appl Genet. 120:463–473. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Fiévet JB, Nidelet T, Dillmann C, de Vienne D.. 2018. Heterosis is a systemic property emerging from non-linear genotype-phenotype relationships: evidence from in vitro genetics and computer simulations. Front Genet. 9:00159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Fischer S, Möhring J, Maurer HP, Piepho H-P, Thiemt E-M, et al. 2009. Impact of genetic divergence on the ratio of variance due to specific vs. general combining ability in winter triticale. Crop Sci. 49:2119–2122. [Google Scholar]
  43. Fischer S, Möhring J, Schön CC, Piepho H-P, Klein D, et al. 2008. Trends in genetic variance components during 30 years of hybrid maize breeding at the University of Hohenheim. Plant Breed. 127:446–451. [Google Scholar]
  44. Forneris NS, Vitezica ZG, Legarra A, Pérez-Enciso M.. 2017. Influence of epistasis on response to genomic selection using complete sequence data. Genet Sel Evol. 49:10.1186/s12711–017–0340–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Franke J, Klözer A, de Visser JAGM, Krug J.. 2011. Evolutionary accessibility of mutational pathways. PLoS Comput Biol. 7:e1002134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Gao H, Gadlage MJ, Lafitte HR, Lenderts B, Yang M, et al. 2020. Superior field performance of waxy corn engineered using CRISPR-Cas9. Nat Biotechnol. 38:579–581. [DOI] [PubMed] [Google Scholar]
  47. Gaynor RC, Gorjanc G, Bentley AR, Ober ES, Howell P, et al. 2017. A two-part strategy for using genomic selection to develop inbred lines. Crop Sci. 57:2372–2386. [Google Scholar]
  48. Gerke JP, Edwards JW, Guill KE, Ross-Ibarra J, McMullen MD.. 2015. The genomic impacts of drift and selection for hybrid performance in maize. Genetics. 201:1201–1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Gibson G, Dworkin I.. 2004. Uncovering cryptic genetic variation. Nat Rev Genet. 5:681–690. [DOI] [PubMed] [Google Scholar]
  50. Goff SA. 2011. A unifying theory for general multigenic heterosis: energy efficiency, protein metabolism, and implications for molecular breeding. New Phytologist. 189:923–937. [DOI] [PubMed] [Google Scholar]
  51. Goodnight CJ. 1988. Epistasis and the effect of founder events on the additive genetic variance. Evolution. 42:441–454. [DOI] [PubMed] [Google Scholar]
  52. Goodnight CJ. 1995. Epistasis and the increase in additive genetic variance: implications for phase 1 of Wright’s shifting-balance process. Evolution. 49:502–511. [DOI] [PubMed] [Google Scholar]
  53. Grove N, Baumann O.. 2012. Complexity in the telecommunications industry: when integrating infrastructure and services backfires. Telecomm Policy. 36:40–50. [Google Scholar]
  54. Guo M, Rupe MA, Wei J, Winkler C, Goncalves-Butruille M, et al. 2014. Maize ARGOS1 (ZAR1) transgenic alleles increase hybrid maize yield. J Exp Bot. 65:249–260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Hallauer AR, Carena MJ, Filho JBM.. 2010. Quantitative Genetics in Maize Breeding: Handbook of Plant Breeding, 3rd ed.New York, NY: Springer-Verlag; . [Google Scholar]
  56. Hammer G, Cooper M, Tardieu F, Welch S, Walsh B, et al. 2006. Models for navigating biological complexity in breeding improved crop plants. Trends Plant Sci. 11:587–593. [DOI] [PubMed] [Google Scholar]
  57. Hammer GL, Dong Z, McLean G, Doherty A, Messina C, et al. 2009. Can changes in canopy and/or root system architecture explain historical maize yield trends in the U.S. corn belt? Crop Sci. 49:299–312. [Google Scholar]
  58. Harrington SA, Backhaus AE, Singh A, Hassani-Pak K, Uauy C.. 2020. The wheat GENIE3 network provides biologically-relevant information in polyploid wheat. G3 (Bethesda). 10:3675–3686. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Hauben M, Haesendonckx B, Standaert E, Van Der Kelen K, Azmi A, et al. 2009. Energy use efficiency is characterized by an epigenetic component that can be directed through artificial selection to increase yield. Proc Natl Acad Sci USA. 106:20109–20114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Hayashi Y, Aita T, Toyota H, Husimi Y, Urabe I, et al. 2006. Experimental rugged fitness landscape in protein sequence space. PLoS One. 1:e96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Hickey LT, Hafeez AN, Robinson H, Jackson SA, Leal-Bertioli SCM, et al. 2019. Breeding crops to feed 10 billion. Nat Biotechnol. 37:744–754. [DOI] [PubMed] [Google Scholar]
  62. Hill WG, Goddard ME, Visscher PM.. 2008. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet. 4:e1000008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Holland JB. 2001. Epistasis and plant breeding. In: Jules Janick, (editor). Plant Breeding Reviews, Volume 21, Hoboken: Wiley, p. 27–92. [Google Scholar]
  64. Huang W, Mackay TFC.. 2016. The genetic architecture of quantitative traits cannot be inferred from variance component analysis. PLoS Genet. 12: e1006421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Jaganathan D, Ramasamy K, Sellamuthu G, Jayabalan S, Venkataraman G.. 2018. CRISPR for crop improvement: an update review. Front Plant Sci. 9:985. 10.3389/fpls.2018.00985 [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Jiang Y, Schmidt RH, Zhao Y, Reif JC.. 2017. A quantitative genetic framework highlights the role of epistatic effects for grain-yield heterosis in bread wheat. Nat Genet. 49:1741–1746. [DOI] [PubMed] [Google Scholar]
  67. Katz A, Young S.. 1975. Selection for high adult body weight in Drosophila populations with different structures. Genetics. 81:163–175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Kauffman SA. 1993. The Origins of Order: Self-Organization and Selection in Evolution. New York, NY: Oxford University Press. [Google Scholar]
  69. Kusmec A, Zheng Z, Archontoulis S, Ganapathysubramanian B, Hu G, et al. 2021. Interdisciplinary strategies to enable data-driven plant breeding in a changing climate. One Earth. 4:372–383. [Google Scholar]
  70. Labate J, Lamkey KR, Lee M, Woodman WL.. 1999. Temporal changes in allele frequencies in two reciprocally selected maize populations. Theor Appl Genet. 99:1166–1178. [Google Scholar]
  71. Larièpe A, Moreau L, Laborde J, Bauland C, Mezmouk S, et al. 2017. General and specific combining abilities in a maize (Zea mays L.) test-cross hybrid panel: relative importance of population structure and genetic divergence between parents. Theor Appl Genet. 130:403–417. [DOI] [PubMed] [Google Scholar]
  72. Lazer D, Friedman A.. 2007. The network structure of exploration and exploitation. Adm Sci Q. 52:667–694. [Google Scholar]
  73. Li X, Guo T, Mu Q, Li X, Yu J.. 2018. Genomic and environmental determinants and their interplay underlying phenotypic plasticity. Proc Natl Acad Sci USA. 115:6679–6684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Lippman ZB, Zamir D.. 2007. Heterosis: revisiting the magic. Trends Genet. 23:60–66. [DOI] [PubMed] [Google Scholar]
  75. Longin CFH, Gowda M, Mühleisen J, Ebmeyer E, Kazman E, et al. 2013. Hybrid wheat: quantitative genetic parameters and consequences for the design of breeding programs. Theor Appl Genet. 126:2791–2801. [DOI] [PubMed] [Google Scholar]
  76. Lynch M, Walsh B.. 1998. Genetics and Analysis of Quantitative Traits. Sunderland: Sinauer Associates.
  77. Mackay TFC. 2014. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet. 15:22–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Makumburage GB, Richbourg HL, LaTorre KD, Capps A, Chen C, et al. 2013. Genotype to phenotype maps: multiple input abiotic signals combine to produce growth effects via attenuating signaling interactions in maize. G3 (Bethesda). 3:2195–2204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Melchinger AE. 1999. Genetic diversity and heterosis. In Coors J, Pandey S, editors In the Genetics and Exploitation of Heterosis in Crops. Madison, WI: ASA, CSSA, and SSSA. p. 99–118. [Google Scholar]
  80. Melchinger AE, Geiger HH, Seitz G, Schmidt GA.. 1987. Optimum prediction of three-way crosses from single crosses in forage maize (Zea mays L.). Theor Appl Genetics. 74:339–345. [DOI] [PubMed] [Google Scholar]
  81. Melchinger AE, Gumber RK.. 1998. Overview of heterosis and heterotic groups in agronomic crops. In: Kendall R. Larnkey and Jack E. Staub, (editor). Concepts and Breeding of Heterosis in Crop Plants. Madison: CSSA. p. 29–44.
  82. Messina CD, Podlich D, Dong Z, Samples M, Cooper M.. 2011. Yield-trait performance landscapes: from theory to application in breeding maize for drought tolerance. J Exp Bot. 62:855–868. [DOI] [PubMed] [Google Scholar]
  83. Messina CD, Technow F, Tang T, Totir R, Gho C, et al. 2018. Leveraging biological insight and environmental variation to improve phenotypic prediction: integrating crop growth models (CGM) with whole genome prediction (WGP). Eur J Agron. 100:151–162. [Google Scholar]
  84. Messmer MM, Melchinger AE, Boppenmaier J, Brunklaus-Jung E, Herrmann RG.. 1992. Relationships among early European maize inbreds: I. genetic diversity among flint and dent lines revealed by RFLPs. Crop Sci. 32:1301–1309. [Google Scholar]
  85. Meuwissen THE, Hayes BJ, Goddard ME.. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 157:1819–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Mikel MA. 2006. Availability and analysis of proprietary dent corn inbred lines with expired U.S. plant variety protection. Crop Sci. 46:2555–2560. [Google Scholar]
  87. Mikel MA, Dudley JW.. 2006. Evolution of North American dent corn from public to proprietary germplasm. Crop Sci. 46:1193–1205. [Google Scholar]
  88. Montana G. 2005. Hapsim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. Bioinformatics. 21:4309–4311. [DOI] [PubMed] [Google Scholar]
  89. Naciri-Graven Y, Goudet JRM.. 2003. The additive genetic variance after bottlenecks is affected by the number of loci involved in epistatic interactions. Evolution. 57:706–716. [DOI] [PubMed] [Google Scholar]
  90. Nahum JR, Godfrey-Smith P, Harding BN, Marcus JH, Carlson-Stevermer J, et al. 2015. A tortoise–hare pattern seen in adapting structured and unstructured populations suggests a rugged fitness landscape in bacteria. Proc Natl Acad Sci USA. 112:7530–7535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Østman B, Hintze A, Adami C.. 2012. Impact of epistasis and pleiotropy on evolutionary adaptation. Proc Biol Sci. 279:247–256. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Pfeiffer BK, Pietsch D, Schnell RW, Rooney WL.. 2019. Long-term selection in hybrid sorghum breeding programs. Crop Sci. 59:150–164. [Google Scholar]
  93. Phillips PC. 2008. Epistasis – the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 9:855–867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Piepho HP. 1998. Methods for comparing the yield stability of cropping systems. J Agron Crop Sci. 180:193–213. [Google Scholar]
  95. Pinheiro HP, de Souza Pinheiro A, Sen PK.. 2005. Comparison of genomic sequences using the Hamming distance. J Stat Plan Inference. 130:325–339. [Google Scholar]
  96. Podlich DW, Cooper M.. 1999. Modelling plant breeding programs as search strategies on a complex response surface. Lecture Notes Comp Sci. 1585:171–178. [Google Scholar]
  97. Podlich DW, Winkler CR, Cooper M.. 2004. Mapping as you go. Crop Sci. 44:1560–1571. [Google Scholar]
  98. Poland JA, Rife TW.. 2012. Genotyping-by-sequencing for plant breeding and genetics. Plant Genome. 5:92–102. [Google Scholar]
  99. Press WH, Teukolsky SA, Vetterling WT, Flannery BP.. 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge: Cambridge University Press. [Google Scholar]
  100. Qu X, Aldana M, Kadanoff LP.. 2002. Numerical and theoretical studies of noise effects in the Kauffman model. J Stat Phys. 109:967–986. [Google Scholar]
  101. Rasmusson DC, Phillips RL.. 1997. Plant breeding progress and genetic diversity from de novo variation and elevated epistasis. Crop Sci. 37:303–310. [Google Scholar]
  102. Rathie KA, Nicholas FW.. 1980. Artificial selection with differing population structures. Genet Res. 36:117–131. [Google Scholar]
  103. R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
  104. Rebourg C, Chastanet M, Gouesnard B, Welcker C, Dubreuil P, et al. 2003. Maize introduction into Europe: the history reviewed in the light of molecular data. Theor Appl Genet. 106:895–903. [DOI] [PubMed] [Google Scholar]
  105. Reif JC, Gumpert F-M, Fischer S, Melchinger AE.. 2007. Impact of interpopulation divergence on additive and dominance variance in hybrid populations. Genetics. 176:1931–1934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  106. Reif JC, Melchinger AE, Frisch M.. 2005. Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Sci. 45:1–7. [Google Scholar]
  107. Rowe W, Platt M, Wedge DC, Day PJ, Kell DB, et al. 2010. Analysis of a complete DNA-protein affinity landscape. J R Soc Interface. 7:397–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  108. Saha R, Suthers PF, Maranas CD.. 2011. Zea mays iRS1563: a comprehensive genome-scale metabolic reconstruction of maize metabolism. PLoS One. 6: e21784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Schmiegelt B, Krug J.. 2014. Evolutionary accessibility of modular fitness landscapes. J Stat Phys. 154:334–355. [Google Scholar]
  110. Schopp P, Riedelsheimer C, Utz HF, Schön C-C, Melchinger AE.. 2015. Forecasting the accuracy of genomic prediction with different selection targets in the training and prediction set as well as truncation selection. Theor Appl Genet. 128:2189–2201. [DOI] [PubMed] [Google Scholar]
  111. Seye AI, Bauland C, Charcosset A, Moreau L.. 2020. Revisiting hybrid breeding designs using genomic predictions: simulations highlight the superiority of incomplete factorials between segregating families over topcross designs. Theor Appl Genet. 133:1995–2010. [DOI] [PubMed] [Google Scholar]
  112. Shi J, Gao H, Wang H, Lafitte HR, Archibald RL, et al. 2017. ARGOS8 variants generated by CRISPR-Cas9 improve maize grain yield under field drought stress conditions. Plant Biotechnol J. 15:207–216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Shi J, Habben JE, Archibald RL, Drummond BJ, Chamberlin MA, et al. 2015. Overexpression of ARGOS genes modifies plant sensitivity to ethylene, leading to improved drought tolerance in both Arabidopsis and maize. Plant Physiol. 169:266–282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Shull GH. 1908. The composition of a field of maize. J Hered. os-4:296–301. [Google Scholar]
  115. Silva Dias JC. 2010. Impact of improved vegetable cultivars in overcoming food insecurity. Euphytica. 176:125–136. [Google Scholar]
  116. Simmons CR, Lafitte HR, Reimann KS, Brugiére N, Roesler K, et al. 2021. Successes and insights of an industry biotech program to enhance maize agronomic traits. Plant Sci. 307:110899. [DOI] [PubMed] [Google Scholar]
  117. Smith S, Löffler C, Cooper M.. 2006. Genetic diversity among maize hybrids widely grown in contrasting regional environments in the United States during the 1990s. Maydica. 51:233–242. [Google Scholar]
  118. Sprague GF, Tatum LA.. 1942. General vs. specific combining ability in single crosses of corn. Agronj. 34:923–932. [Google Scholar]
  119. Studer AJ, Wang H, Doebley JF.. 2017. Selection during maize domestication targeted a gene network controlling plant and inflorescence architecture. Genetics. 207:755–765. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Svensson E, Calsbeek R.. 2013. The Adaptive Landscape in Evolutionary Biology. Oxford: Oxford University Press. [Google Scholar]
  121. Technow F. 2013. hypred: Simulation of Genomic Data in Applied Genetics. R package, version 0.4. Hohenheim: University of Hohenheim.
  122. Technow F. 2019. Use of F2 bulks in training sets for genomic prediction of combining ability and hybrid performance. G3 (Bethesda). 9:1557–1569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Technow F, Gerke JP.. 2017. Parent-progeny imputation from pooled samples for cost-efficient genotyping in plant breeding. PLoS One. 12:e0190271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  124. Technow F, Messina CD, Totir LR, Cooper M.. 2015. Integrating crop growth models with whole genome prediction through approximate Bayesian computation. PLos ONE. 10:e0130855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  125. Technow F, Schrag TA, Schipprack W, Bauer E, Simianer H, et al. 2014a. Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics. 197:1343–1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  126. Technow F, Schrag TA, Schipprack W, Melchinger AE.. 2014b. Identification of key ancestors of modern germplasm in a breeding program of maize. Theor Appl Genet. 127:2545–2553. [DOI] [PubMed] [Google Scholar]
  127. Tollenaar M, Lee EA.. 2002. Yield potential, yield stability and stress tolerance in maize. Field Crop Res. 75:161–169. [Google Scholar]
  128. Troyer AF. 2009. Development of hybrid corn and the seed corn industry. In Bennetzen JL, Hake S, editors Handbook of Maize. New York, NY: Springer, p. 87–114. [Google Scholar]
  129. Vacher M, Small I.. 2019. Simulation of heterosis in a genome-scale metabolic network provides mechanistic explanations for increased biomass production rates in hybrid plants. Syst Biol Appl. 5:24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  130. van Heerwaarden B, Willi Y, Kristensen TN, Hoffmann AA.. 2008. Population bottlenecks increase additive genetic variance but do not break a selection limit in rain forest drosophila. Genetics. 179:2135–2146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  131. Vasseur F, Fouqueau L, Vienne D. D, Nidelet T, Violle C, et al. 2019. Nonlinear phenotypic variation uncovers the emergence of heterosis in Arabidopsis thaliana. PLoS Biol. 17:e3000214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Voss-Fels KP, Cooper M, Hayes BJ.. 2019. Accelerating crop genetic gains with genomic selection. Theor Appl Genet. 132:669–686. [DOI] [PubMed] [Google Scholar]
  133. Wade MJ. 2002. A gene’s eye view of epistasis, selection and speciation. J Evol Biol. 15:337–346. [Google Scholar]
  134. Wade MJ, Goodnight CJ.. 1998. The theories of Fisher and Wright in the context of metapopulations: when nature does many small experiments. Evolution. 52:1537–1553. [DOI] [PubMed] [Google Scholar]
  135. Wallace JG, Rodgers-Melnick E, Buckler ES.. 2018. On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annu Rev Genet. 52:421–444. [DOI] [PubMed] [Google Scholar]
  136. Weinreich DM, Lan Y, Jaffe J, Heckendorn RB.. 2018. The influence of higher-order epistasis on biological fitness landscape topography. J Stat Phys. 172:208–225. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. White MR, Mikel MA, Leon N. D, Kaeppler SM.. 2020. Diversity and heterotic patterns in North American proprietary dent maize germplasm. Crop Sci. 60:100–114. [Google Scholar]
  138. Wilkins O, Hafemeister C, Plessis A, Holloway-Phillips M-M, Pham GM, et al. 2016. EGRINs (environmental gene regulatory influence networks) in rice that function in the response to water deficit, high temperature, and agricultural environments. Plant Cell. 28:2365–2384. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Wolc A, Kranis A, Arango J, Settar P, Fulton JE, et al. 2016. Implementation of genomic selection in the poultry industry. Anim Front. 6:23–31. [Google Scholar]
  140. Wright S. 1931. Evolution in Mendelian populations. Genetics. 16:97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Wright S. 1932. The roles of mutation, inbreeding, crossbreeding, and selection in evolution. Proceedings of the Sixth International Congress of Genetics, Ithaca, NY: Brooklyn Botanic Garden. p. 356–366.
  142. Wright S. 1977. Evolution and the Genetics of Populations. Chicago, IL: University of Chicago Press. [Google Scholar]
  143. Wu J, Lawit SJ, Weers B, Sun J, Mongar N, et al. 2019. Overexpression of zmm28 increases maize grain yield in the field. Proc Natl Acad Sci USA. 116:23850–23858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  144. Yu X, Li X, Guo T, Zhu C, Wu Y, et al. 2016. Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat Plant. 2:16150. 10.1038/NPLANTS.2016.150 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The authors affirm that all data necessary for confirming the conclusions of this article are represented fully within the article and its figures and supplemental material. The R and C code for the simulations is provided as supplementary material. The supplementary material is available at figshare: https://doi.org/10.25387/g3.14036627.


Articles from G3: Genes|Genomes|Genetics are provided here courtesy of Oxford University Press

RESOURCES