Skip to main content
G3: Genes | Genomes | Genetics logoLink to G3: Genes | Genomes | Genetics
. 2017 Jan 4;7(3):801–811. doi: 10.1534/g3.116.036582

Persistency of Prediction Accuracy and Genetic Gain in Synthetic Populations Under Recurrent Genomic Selection

Dominik Müller 1,1, Pascal Schopp 1,1, Albrecht E Melchinger 1,2
PMCID: PMC5345710  PMID: 28064189

Abstract

Recurrent selection (RS) has been used in plant breeding to successively improve synthetic and other multiparental populations. Synthetics are generated from a limited number of parents (Np), but little is known about how Np affects genomic selection (GS) in RS, especially the persistency of prediction accuracy (rg,g^) and genetic gain. Synthetics were simulated by intermating Np= 2–32 parent lines from an ancestral population with short- or long-range linkage disequilibrium (LDA) and subjected to multiple cycles of GS. We determined rg,g^ and genetic gain across 30 cycles for different training set (TS) sizes, marker densities, and generations of recombination before model training. Contributions to rg,g^ and genetic gain from pedigree relationships, as well as from cosegregation and LDA between QTL and markers, were analyzed via four scenarios differing in (i) the relatedness between TS and selection candidates and (ii) whether selection was based on markers or pedigree records. Persistency of rg,g^ was high for small Np, where predominantly cosegregation contributed to rg,g^, but also for large Np, where LDA replaced cosegregation as the dominant information source. Together with increasing genetic variance, this compensation resulted in relatively constant long- and short-term genetic gain for increasing Np > 4, given long-range LDA in the ancestral population. Although our scenarios suggest that information from pedigree relationships contributed to rg,g^ for only very few generations in GS, we expect a longer contribution than in pedigree BLUP, because capturing Mendelian sampling by markers reduces selective pressure on pedigree relationships. Larger TS size (NTS) and higher marker density improved persistency of rg,g^ and hence genetic gain, but additional recombinations could not increase genetic gain.

Keywords: genomic prediction, recurrent selection, synthetic populations, prediction accuracy, genetic gain, GenPred, Shared Data Resources, Genomic Selection


RS is an integral tool in plant breeding that targets the systematic improvement of quantitative traits in broad-based populations by increasing the frequency of favorable alleles, while maintaining genetic variability (Hallauer and Carena 2012). Source materials in allogamous crops include open-pollinated and synthetic populations (synthetics, Hallauer 1992). Synthetics are created by intermating a limited number of parental components and cross-pollinating the progeny for one or several generations (Falconer and Mackay 1996). A prominent example is the Iowa Stiff Stalk Synthetic (BSSS), which was developed from 16 inbred lines in the 1930s and has since been subjected to two long-term RS programs (Hallauer 2008), which have contributed a large proportion of today’s commercial maize germplasm (Mikel and Dudley 2006).

GS is a novel statistical method (Meuwissen et al. 2001) with the capability to accelerate future genetic progress in plant breeding (Heffner et al. 2010). Several studies indicate a potential superiority of GS over phenotypic selection (Bernardo 2009; Wong and Bernardo 2009; Jannink 2010; Yabe et al. 2013), marker-assisted selection (Bernardo and Yu 2007; Wong and Bernardo 2009; Heffner et al. 2010, Yabe et al. 2013), as well as pedigree-based selection (Muir 2007; Wolc et al. 2011a, 2016; Bastiaansen et al. 2012; Van Grevenhof et al. 2012). Although the usefulness of GS across two selection cycles has empirically been demonstrated in biparental maize families (Massman et al. 2013; Beyene et al. 2015), experimental results on long-term GS are still missing.

GS has further been proposed as a particularly suitable tool for RS in synthetics (Windhausen et al. 2012; Gorjanc et al. 2016). In this context, an established prediction equation could be used repeatedly for multiple cycles of selection without retraining. Combined with the use of off-season nurseries, this promises to increase genetic gain per unit time and to reduce costs for phenotyping (Bernardo and Yu 2007). The success of this strategy largely depends on persistency of the rg,g^ of estimated breeding values (EBV) across selection cycles to ensure satisfactory genetic gain when selection candidates are separated by one or more cycles from the model training generation. Although formulas for forecasting rg,g^ in a single cycle were derived (Daetwyler et al. 2008; Hayes et al. 2009; Goddard 2009; Goddard et al. 2011), no closed analytical solutions are available for calculating rg,g^, the additive genetic variance (σA2) and the cumulative genetic gain (ΔG) across several selection cycles. This is because changes in the LD pattern, allele frequencies, and loss of polymorphisms are unpredictable (Jannink 2010).

While empirical results on persistency of rg,g^ in actual plant breeding programs are scarce to date, several simulation studies across multiple generations investigated rg,g^ of GS, assuming random mating of the whole population between generations (Meuwissen et al. 2001; Habier et al. 2007; Nielsen et al. 2009; Solberg et al. 2009). Others assumed selection and were therefore able to evaluate potential genetic gain using GS (Muir 2007; Sonesson and Meuwissen 2009; Jannink 2010; Bastiaansen et al. 2012; Yabe et al. 2013, 2016; Liu et al. 2015). However, these studies generally considered fairly large effective population sizes Ne100, which are unrealistic for synthetics in plant breeding. In synthetics, the number of parents is usually relatively small and parents are often related, leading to small Ne of the population. It is yet unclear how such a small Ne influences the persistency of rg,g^ in genomic RS.

Initially, LD between QTL and molecular markers (commonly SNPs) of high density maps was considered as the only source of information exploited in GS (Meuwissen et al. 2001). In synthetics, LD between QTL and SNPs is attributable to (i) LDA in the population from which the parents were taken, and (ii) sample LD, randomly generated by using a restricted number of parents Np (Schopp et al. 2017). Sample LD is conserved from parents to progeny between cosegregating loci, and has therefore been termed cosegregation. However, it was also demonstrated that SNPs contribute to rg,g^ by capturing pedigree relationships between individuals (Habier et al. 2007). Research in a companion paper (Schopp et al. 2017) showed that the choice of Np in synthetics crucially affects the relative importance of LDA and cosegregation as well as the contribution of pedigree relationships in a single cycle of GS in synthetics. However, no study systematically investigated the importance of these information sources for the persistency of rg,g^ and ΔG in recurrent GS.

Besides the choice of Np, an important question is how often the source material should be recombined before starting RS. Additional recombination might release genetic variability useful for long-term genetic gain (Schnable et al. 1996). For instance, Bernardo (2009) recommended the use of F2 instead of F1 plants in the production of maize doubled haploids. However, additional recombination might also adversely affect the three information sources in GS, and so far studies have not addressed whether this can outweigh the potential increase in long-term genetic gain.

In the present study, we applied fully stochastic forward-in-time simulations and generated two ancestral populations differing substantially in LDA. From these, we sampled different numbers of parents Np to create synthetics that were subjected to multiple cycles of recurrent GS, either directly or after additional generations of recombination. Our objectives were to (i) analyze rg,g^ and ΔG in recurrent GS, depending on the number of parents Np, LDA, and the number of recombination generations NR, and (ii) determine the importance of the three information sources, considering also NTS and SNP density. Finally, we discuss implications for practical decisions in breeding programs employing recurrent GS.

Methods

Genome properties and simulation of ancestral populations

Properties of the genome, construction of the genetic map, and simulation of ancestral populations are detailed in Schopp et al. (2017). In brief, we selected maize (Zea mays L.) as a model species using genetic map positions for 37,286 SNPs distributed over 10 chromosomes with 1913 cM in total. Using the software QMSim (Sargolzaei and Schenkel 2009), we simulated two ancestral populations with either short-range LDA (SR) or extensive long-range LDA (LR). First, we generated an initial population of 1500 diploid individuals by sampling alleles at each (biallelic) locus independently from a Bernoulli distribution with probability 0.5. Second, 5000 loci were randomly sampled from all SNPs and henceforth interpreted as QTL; all remaining loci were considered as SNP markers. Third, these individuals were randomly mated for 3000 generations with a constant population size of 1500 and a mutation rate of 2.5*105 until mutation-drift-equilibrium was reached. Fourth, a strong population bottleneck was imposed by reducing the population size to 30 arbitrarily selected individuals, followed by 15 additional generations of random mating to generate extensive long-range LDA. Lastly, the population was expanded to 10,000 individuals and randomly mated three times more to establish ancestral population LR. Ancestral population SR was derived from LR by continuing random mating for 100 generations with constant population size of 10,000 to break down long-range LDA. Due to this large population size, genetic drift had only a negligible influence and hence allele frequencies were nearly identical in both ancestral populations. The heterozygous ancestral populations (LR and SR) were considered as unrelated and were used as reference bases for the pedigree of all subsequently derived individuals.

Simulation of synthetic populations

The RS breeding scheme applied is shown in Figure 1 and factors analyzed are listed in Table 1. The simulation of the synthetics varied, depending on whether the parents of the TS and the recurrent selection candidates (RSC) were identical (PTS=PRSC) or disjoint (PTSPRSC=ø). For PTS=PRSC, a single synthetic was simulated from which both the TS and the RSC were sampled, whereas for PTSPRSC=ø TS and RSC were taken from two synthetics having no parents in common. In both cases, Np{2,3,4,6,8,12,16,32} parental gametes were randomly drawn from the same ancestral population and chromosomes were doubled in silico to generate fully homozygous parent lines. These were intermated to obtain all possible [Np(Np1)]/2 single crosses, denoted as generation Syn0. Subsequently, single crosses were randomly mated NR times (allowing for selfings) to obtain generation SynNR, from which the TS (SynNRTS) and RSC (SynNRRSC) were later drawn. Here, NR{1,2,3,4,5} counts the number of recombination generations conducted prior to initiating RS. For the special case of Np=2, the Syn0 corresponded to a F1 cross and Syn1 to a F2 family.

Figure 1.

Figure 1

Schematic representation of the breeding program applied in this study. Two synthetic populations SynNR(1) and SynNR(2) were separately created by using NR recombination generations from Np parental gametes drawn from one ancestral population [with short- (SR) or long-range linkage disequilibrium (LR)]. If the training set (TS) and the recurrent selection candidates (RSC) were related, TS and RSC were sampled from the same synthetic SynNR(1), and if they were unrelated, they were drawn from separate synthetics SynNR(1) and SynNR(2). In each cycle of recurrent selection, Ns=10 individuals were selected and recombined to establish the next generation.

Table 1. Overview of the factors analyzed in our simulation study.

Factors Levels
Primary factors
 Ancestral population SR, LR
 Information scenario ReLDASNP,ReLDAPed,ReLEASNP,UnLDASNP
 Number of parents (NP) 2, 3, 4, 6, 8, 12, 16, 32
Secondary factors
 Selection scenario EBV, TBV, RBV
 Number of recombination generations (NR) 1, 2, 3, 4, 5
 Marker density 0.125, 2.5 cM1
 Training set size (NTS) 250, 1000

For secondary factors, bold face type factor levels indicate the default simulation setting. SR, short-range; LR, long-range; Re, related; LDA, ancestral linkage disequilibrium; SNP, single nucleotide polymorphism; Ped, pedigree; LEA, ancestral linkage equilibrium; Un, unrelated; EBV, estimated breeding values; TBV, true breeding values; RBV, random breeding values.

Genetic model

We assumed a quantitative trait based on 1000 biallelic QTL with purely additive gene action and absence of QTL × year interactions. For each simulation replicate, QTL were randomly sampled from the 37,286 SNPs present in the ancestral population. Following Meuwissen et al. (2001), absolute values of QTL effects were drawn from a gamma distribution with scale and shape parameter of 0.4 and 1.66, respectively. Signs of QTL effects were sampled from a Bernoulli distribution with probability 0.5. Although we assumed biallelic QTL, the alleles of neighboring QTL are strongly correlated due to LDA and linkage, effectively leading to haploblocks that could be considered as higher-level multi-allelic QTL. The true breeding value (TBV) gi for any individual i (either from the synthetics or from the ancestral populations) was computed as gi= k=1mWij aj, where Wij  counts the number of minor alleles at the j-th QTL centered by the respective ancestral allele frequency in LR, and aj is the associated QTL effect. Phenotypes yi were simulated as yi=gi+ei, where eiN(0,σe2 ) is an environmental noise variable. The error variance σe2 was assumed to be constant throughout all simulations and was determined as follows: for all individuals in the ancestral population LR, TBVs were calculated according to the above procedure under replicated sampling of 1000 QTL together with their associated effects. The variance of the noise variable σe2 was then set equal to the mean additive genetic variance σA2(anc). As the allele frequencies in both ancestral populations were virtually identical, σA2(anc) was also the mean additive genetic variance in ancestral population SR. This approach implies that the heritability in ancestral populations LR and SR was, on average, 0.5. Heritability was lower in the synthetics due to the finite sample of parents and, on average, h20.5 for Np20,000.

Information source scenarios

We employed four distinct scenarios to evaluate the contributions of the three information sources used in Genomic Best Linear Unbiased Prediction (GBLUP) for estimating actual relationships at causal loci by SNPs (cf. Habier et al. 2013). These scenarios can be distinguished by (i) the relatedness of the TS and RSC and (ii) the type of data employed for calculating the relationship matrix used as a kernel in GBLUP (Supplemental Material, Table S1).

Our standard scenario was ReLDASNP, where the TS and RSC were related (Re) as their parents were identical (PTS=PRSC). The kernel in GBLUP was calculated based on SNPs (excluding QTL) and thus contained genomic relationships. As a consequence, this scenario harnesses all three sources of information, namely: (i) pedigree relationships captured by SNPs, (ii) cosegregation between QTL and SNPs by virtue of the parents being identical, and (iii) LDA between QTL and SNPs due to the presence of LDA in the ancestral population, which was carried over to the synthetics. ReLDASNP is a realistic scenario and is perhaps the most frequent scenario encountered in applications of GS.

Scenario ReLEASNP was artificial and was derived from ReLDASNP. Here, for each of the 10 chromosomes, the multi-locus genotypes of QTL and SNPs were regarded as separate units and were reshuffled among the Np parents prior to intermating. This procedure broke up historical associations between QTL and SNPs due to LDA, while conserving the LD structure among QTL and among SNPs as well as their allele frequencies. Hence, information from LDA cannot contribute to rg,g^ and any LD between QTL and SNPs is exclusively due to sampling a limited number of parental gametes from the ancestral population, i.e., sample LD.

Scenario ReLDAPed was identical to ReLDASNP except that the kernel of GBLUP was the numerator relationship matrix calculated from pedigree records of all individuals (pedigree BLUP). This scenario provided a reference for rg,g^ and its dynamics across cycles that can be obtained exclusively from known pedigree relationships between TS and RSC.

In scenario UnLDASNP, the TS and RSC were unrelated (Un), because their parents were distinct (PTSPRSC=ø). Thus, the influence of pedigree relationships captured by SNPs and cosegregation between QTL and SNPs is eliminated, and the only remaining connection between the TS and RSC is the LD shared due to their common ancestral population, i.e., LDA.

Genomic prediction model

We used GBLUP to predict breeding values gi according to the model equation

yi=μ+gi+ϵi,

where yi and gi are the phenotypic and breeding values, respectively, of the i-th individual, μ is the overall population mean, and ϵi the associated model residual. Standard assumptions about the distribution of the random effects were (gi)MVN(0,σa2K), (ϵi)MVN(0,σϵ2I), and stochastic independence of (gi) and (ϵi). Variance component estimates for σa2 and σϵ2, as well as predicted breeding values were calculated using the R-package rrBLUP (Endelman 2011). The matrix σa2K=(σa2kij) describes the variance–covariance structure of the breeding values of all individuals (TS and RSC) and was computed based on different types of data, depending on the information scenario. For ReLDASNP, ReLEASNP, and UnLDASNP, SNP-based genomic relationship coefficients kij between individuals i and j were computed following VanRaden (2008) as

kij=k(xik2pk)(xjk2pk)k2pi(1pk),

where xik,xjk{0,1,2} are the genotypic SNP scores and pk is the frequency at the k-th SNP marker in the ancestral populations. In scenario ReLDAPed, pedigree relationships were computed from the complete pedigree records of all individuals using the R-package pedigree (Coster 2013).

Recurrent genomic selection scheme

The TS was sampled once from synthetic SynNR(1) (Figure 1) and thereupon was used to predict breeding values in all of 30 selection cycles. The initial 100 RS candidates were sampled from the remaining individuals of SynNR(1), if PTS=PRSC, or from the second synthetic SynNR(2), if PTSPRSC=ø. In each cycle C, the top Ns=10 individuals were selected (before flowering) either based on (i) EBV calculated by GBLUP or pedigree BLUP (scenario ReLDAPed), (ii) TBV, corresponding to phenotypic selection with h2=1, or (iii) “random breeding values” (RBV), being chosen at random. While EBV shows the realistic decay of rg,g^ (taking into account that rg,g^ in earlier cycles influences rg,g^ in later cycles), TBV provides an identical and constant selection accuracy of one, independent of rg,g^ for all scenarios. RBV shows the decay of rg,g^ without directional selection, i.e., the decay that is caused by recombination and genetic drift alone. The selected fraction of 10% is realistic for practical applications and has been used in other simulation studies (e.g., Jannink 2010). The selected candidates were subsequently recombined by random mating to create 100 new progeny, serving as RSC in the next selection cycle. The effects of NTS{250,1000} and of SNP density {0.125, 2.5 SNPs per cM} were examined in independent simulations, with default values of NTS=250 and 2.5 cM1 SNPs. For each combination of factors, we conducted 500 independent simulation replicates. Here, one replicate encompasses: (i) sampling of Np parents from the ancestral population; (ii) sampling of 1000 QTL together with their QTL effects and an appropriate number of SNPs to reach the desired marker density; (iii) creation of the synthetics assuming different numbers of generations of random mating, and sampling of the TS and the initial RSC; (iv) simulation of phenotypes for TS individuals; and (v) conduction of recurrent GS without retraining for 30 selection cycles. All simulations were performed with the R statistical language (R Core Team 2015) and code is provided in File S2.

Cumulative genetic gain, additive genetic variance, and prediction accuracy

In each selection cycle, the cumulative genetic gain (ΔG) was computed as the average of all 100 TBVs gi of the RSC relative to the average in C=0. The σA2 of the RSC was computed as the variance of gi values. The ΔG was expressed in units of σA(anc) and σA2 in units of σA2(anc). rg,g^ was calculated as the Pearson correlation coefficient between TBVs gi and predicted breeding values g^i of the RSC.

Data availability

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

Results

Dynamics of genetic gain, prediction accuracy, and additive genetic variance

An overview of the dynamics of cumulative genetic gain ΔG and prediction accuracy rg,g^ under recurrent GS for the standard scenario ReLDASNP is given in Figure 2. Across selection cycles, ΔG increased concavely, approaching a plateau. Regardless of the number of parents Np, ΔG was higher in LR compared to SR. For LR, ΔG increased together with Np, whereas for SR, ΔG was lowest for Np=2, highest for Np=4, and intermediate for Np=16. In the model training generation (C=0), rg,g^ ranged between 0.7 and 0.8 and was higher for smaller Np. After the first round of selection, there was a substantial decline in rg,g^ that was strongest for large Np. rg,g^ generally approached an asymptotic value of ∼0.1 in cycle C=30. The overall level of σA2 (Figure S1) in the RSC was higher for larger Np and strongly declined during selection, especially after the first cycle. In C=0, σA2 was nearly identical for LR and SR, and showed a slightly steeper decline in LR.

Figure 2.

Figure 2

(A) Average cumulative genetic gain ΔG and (B) average prediction accuracy rg,g^ in scenario ReLDASNP under recurrent genomic selection across C=0,1,,30 selection cycles for synthetics produced from Np=2, 4, 16 parents taken from ancestral populations SR or LR. Values of ΔG are expressed in units of σA(anc). LDA, ancestral linkage disequilibrium; LR, long-range linkage disequilibrium; Re, related; SNP, single nucleotide polymorphism; SR, short-range linkage disequilibrium.

Cumulative genetic gain

To explore in greater detail ΔG in C=30 and the information sources primarily exploited, we varied Np between 2 and 32 (Figure 3). Here, the relationship between ΔG and Np in scenario ReLDASNP was strongly affected by the level of LDA. For LR, ΔG initially increased between Np=2 and Np=8 and then remained nearly constant for larger Np. For SR, ΔG also increased initially, but then strongly decreased for larger Np. In scenario UnLDASNP (PTSPRSC=ø), ΔG was much lower than in ReLDASNP and monotonically increased with growing Np. This increase and the overall level of ΔG was much higher in LR than SR. In scenario ReLDAPed, ΔG was zero for Np=2, and strongly increased with Np, plateauing at 8Np12. For scenario ReLDAPed, virtually no further genetic gain could be realized after C=2 (Figure S2).

Figure 3.

Figure 3

Average cumulative genetic gain ΔG under recurrent genomic selection in selection cycle C=30 for synthetics produced from different numbers of parents Np taken from ancestral populations SR or LR. All values are expressed in units of σA(anc). σA2(anc), mean additive genetic variance; LDA, ancestral linkage disequilibrium; LEA, ; LR, long-range linkage disequilibrium; Ped, pedigree; Re, related; SNP, single nucleotide polymorphism; SR, short-range linkage disequilibrium.

Persistency of prediction accuracy

The persistency of rg,g^ for selection regimes EBV, TBV, and RBV under LR is shown in Figure 4. For scenarios ReLDASNP and ReLEASNP, the overall level of rg,g^ declined with growing Np, whereas it increased for scenario UnLDASNP (compare Figure S3). In scenario ReLDASNP, the decay of rg,g^ was strongest in the first selection cycle, especially for large values of Np. In scenario ReLDAPed, rg,g^ could not be calculated for Np=2 and NR=1, as discussed in File S1; for Np>2, rg,g^ started in C=0 at intermediate values of 0.5 for Np=4 and 0.6 for Np=16 but declined to zero within a few cycles if the selection was based on either EBV or TBV. With selection based on RBV, rg,g^ approached zero only for C >10. Scenarios ReLDASNP and ReLEASNP showed identical rg,g^ for NP=2. For Np>2, rg,g^ decreased faster in ReLEASNP than in ReLDASNP and more so with increasing Np. When ancestral long-range LDA was absent (SR), the differences between ReLEASNP and ReLDASNP were generally much smaller, but otherwise trends were similar (results not shown). Scenario UnLDASNP showed an overall low level of rg,g^, especially for SR, where it was close to zero. However, the decline of rg,g^ across cycles was attenuated compared to the other scenarios. When selection was exercised based on TBV, the decay of rg,g^ was similar to selection based on EBV, but much stronger compared with selection based on RBV.

Figure 4.

Figure 4

Average prediction accuracy rg,g^ under recurrent genomic selection across C=0,1,,10 selection cycles for synthetics produced from Np=2,4,16 parents taken from ancestral population LR. Selection of candidates was based on either true breeding values (TBV), random breeding values (RBV), or estimated breeding values (EBV). LDA, ancestral linkage disequilibrium; LEA, ; LR, long-range linkage disequilibrium; Ped, pedigree; Re, related; SNP, single nucleotide polymorphism.

TS size and SNP density

The influence of NTS and SNP density on rg,g^ under selection based on EBV is shown in Figure 5. For all scenarios, increasing NTS elevated the level of rg,g^ across cycles. Specifically, for scenarios assuming PTS=PRSC, increasing NTS reduced the drop in rg,g^ after the first selection cycle, which was not observed for scenario UnLDASNP (PTSPRSC=ø). Increasing marker density from 0.125 to 2.5 cM1 notably increased the level of rg,g^ for all SNP-based scenarios and led to higher persistency of rg,g^ for SNP-based scenarios with identical parents (PTS=PRSC). Scenario UnLDASNP did not show an increased persistency with higher marker density.

Figure 5.

Figure 5

Average prediction accuracy rg,g^ under recurrent genomic selection across C=0,1,,10 selection cycles depending on (A) training set size NTS and (B) marker density for synthetics produced from Np=2,4,16 parents taken from ancestral population LR. LDA, ancestral linkage disequilibrium; LEA, ; LR, long-range linkage disequilibrium; Ped, pedigree; Re, related; SNP, single nucleotide polymorphism.

Number of recombinations

In general, increasing the number of recombinations NR resulted in a decrease of rg,g^ (C=0, Figure 6), except for scenario UnLDASNP, where rg,g^ stayed nearly constant. Increasing NR in scenario ReLDAPed resulted in the strongest decline in rg,g^ of all scenarios, except if Np=2, where it remained constant. For scenario ReLDASNP, increasing NR from 1 to 5 slightly increased long-term ΔG in C=30 for selection based on TBV, but not notably for selection based on EBV (Figure 7). The σA2 in C=0 was not affected by NR (Figure S4A).

Figure 6.

Figure 6

Average prediction accuracy rg,g^ in selection cycle C=0 for different numbers of recombination generations NR used for production of synthetics from Np=2,4,16 parents taken from ancestral populations SR or LR. LDA, ancestral linkage disequilibrium; LEA, ; LR, long-range linkage disequilibrium; Ped, pedigree; Re, related; SNP, single nucleotide polymorphism; SR, short-range linkage disequilibrium.

Figure 7.

Figure 7

Average cumulative genetic gain ΔG under recurrent genomic selection in selection cycle C=5 and C=30 for synthetics produced from different numbers of parents Np taken from ancestral populations SR or LR for NR=1 and NR=5 recombination generations. (A) Selection based on true breeding values (TBV), averages across all information scenarios (because values are expected to be identical). (B) Selection based on estimated breeding values (EBV) for scenario ReLDASNP. All values are expressed in units of σA(anc).σA2(anc), mean additive genetic variance; LDA, ancestral linkage disequilibrium; LR, long-range linkage disequilibrium; Re, related; SNP, single nucleotide polymorphism; SR, short-range linkage disequilibrium.

Discussion

In plant breeding, small effective population sizes that result from a small number of population parents crucially influence the information sources contributing to rg,g^ in a single cycle of GS. For a large number of parents, LDA and pedigree relationships are the driving forces of accuracy, whereas for few parents, cosegregation between QTL and SNPs dominates. While exploitation of information from cosegregation leads to high accuracy, it is unclear how this affects persistency of rg,g^ across selection cycles. Moreover, genetic gain depends on the available genetic variance, which is expected to be reduced for a small number of parents, as opposed to the trend expected for rg,g^. Although persistency and genetic gain in GS have been previously studied, the important situation of the very small effective population sizes in plant breeding, where cosegregation plays a central role, has not been addressed. Hence, the purpose of the present study was to investigate the contributions of the information sources to persistency of rg,g^ and genetic gain across multiple cycles of recurrent GS in synthetic populations, depending on the number of parents.

Persistency of prediction accuracy across cycles

The persistency of rg,g^ in GS is of crucial importance for practical breeding, because it determines the number of generations that can be employed until retraining of the prediction equation becomes necessary. Thus, it affects the optimum design of a breeding program using recurrent GS and its costs and efficiency compared to phenotypic RS. In agreement with previous studies, we observed a substantial drop in rg,g^ in scenario ReLDASNP, especially after the first cycle (Figure 4). It was hypothesized that this decline is due to a loss of information from pedigree relationships captured by SNPs (Habier et al. 2007; Wolc et al. 2011b, 2016). In support of this explanation, we observed rg,g^ to plummet after the first cycle in scenario ReLDAPed and this can be attributed to two reasons. First, even without directional selection, the variation in pedigree relationships between the TS and RSC erodes as the number of generations between both increases (Figure S5C, selection based on RBV). Second, selection based on pedigree relationships favors the choice of candidates closely related to one another (Quinton et al. 1992; Daetwyler et al. 2007), as verified by the substantial increase in inbreeding and the reduced variation in pedigree relationships (Figure S5, A and C), making the breeding population already genetically narrow after only one selection cycle. This causes EBVs to be more similar to each other and hence, also rg,g^ is severely reduced, although the top pedigree relationships between the TS and RSC individuals increase (Figure S5B). Conversely, selection on TBV (corresponding to phenotypic selection with h2=1) imposes less inbreeding (Figure S5A), because candidates can have equally high breeding values without necessarily being closely related, which results in the selection of clusters of closely related candidates (Figure S8).

The strong drop of rg,g^ in scenario ReLDAPed for selection based on EBV might suggest that pedigree relationships only contribute for one or at least very few generations to rg,g^ of scenario ReLDASNP. However, it has to be taken into account that cosegregation of SNPs and QTL allows capturing of Mendelian sampling (Daetwyler et al. 2007), which reduces the selection pressure on pedigree relationships and in turn increases persistency of rg,g^ in scenario ReLDASNP. The effect of reduced selection pressure on pedigree relationships can be inferred from scenario ReLDAPed under selection based on RBV, where essentially all selection pressure was removed and individuals were selected irrespective of their ancestry. Here, rg,g^ showed a much slower decay compared to selection based on EBV (Figure 4). This suggests that in scenario ReLDASNP with selection based on EBV, pedigree relationships probably contribute longer to rg,g^ than indicated by ReLDAPed (selection based on EBV).

It was previously shown that information from LDA is highly persistent across generations (Habier et al. 2007). In synthetics, the observed LD largely corresponds to LDA only if Np is large, which implies that LDA mainly contributes to rg,g^ for large Np (Schopp et al. 2017). Consistent with these findings, for large Np (e.g., 16) LDA was the dominant information source across selection cycles, as verified by the strong reduction in rg,g^ when LDA was artificially removed from scenario ReLDASNP as in ReLEASNP (Figure 4). Conversely, for small Np, the representation of LDA in the synthetics is hampered by randomly created sample LD when selecting the parents, which raises the question how this influences persistency of rg,g^ for small Np. Our results show that for Np=4, the persistency of rg,g^ in scenario ReLDASNP was even higher than compared with Np=16 where it decreased more strongly, even though the contribution of LDA was markedly reduced (the drop of rg,g^ in scenario ReLEASNP was larger for Np=4 than Np=16) compared to Np=16. This implies that sample LD and therefore information from cosegregation behaves similarly to LDA regarding the decay of information across selection cycles. The strong conservation of LDA can be directly assessed from scenario UnLDASNP, where TS and RSC are unrelated and LDA was the only information source (Figure 4). Here, the decay of rg,g^ was generally small, and if selection was based on RBV it was even diminutive, indicating that recombination between QTL and SNPs only marginally drives ancestral LD structures of the TS and the RSC apart. Even if cosegregation information dominates over LDA in the case of small Np (e.g., 4), LDA still substantially contributes to rg,g^, especially in later selection cycles (Figure 4, ReLDASNP vs. ReLEASNP).

The genomic prediction methodology used can also have a bearing on the exploitation of the sources of information, which was not considered in this study. Previous research indicated that (Bayesian) variable selection methods are better suited to capture information from LDA compared to GBLUP, especially if traits are oligogenic and individual QTL have strong effects (Habier et al. 2007, 2013; Zhong et al. 2009). Therefore, we expect that such methods are advantageous in situations where rg,g^ heavily relies on information from LDA, as is the case for large Np or if TS and RSC are unrelated.

Steady state cumulative genetic gain

In any population advanced by RS, the cumulative increase in overall performance is of central interest to breeders. Here, we continued RS until cycle C=30, where further increases in ΔG were only marginal because either σA2 was depleted (Figure S6) and/or rg,g^ was near zero (Figure 4). This approach allowed for direct comparisons between ΔG for different scenarios and conclusions were not contingent on the amount of σA2 left.

Increasing Np leads to an asymptotic increase in the initially available σA2, which was independent of the ancestral population in our simulation (Figure S7). According to the breeder’s equation, increasing σA2 results in higher genetic gain, which partially explains the increase in ΔG for larger Np. However, besides higher σA2, differential contributions of the three sources of information to rg,g^ play a major role. In scenario ReLDAPed, ΔG was relatively constant from medium Np8 on (Figure 3), which is presumably the result of the counterbalancing effects of a slight increase in σA2 and a moderate decrease in rg,g^ with increasing Np. As pointed out by Schopp et al. (2017), increasing Np from medium to large values decreases the frequency of close relatives between TS and RSC and hence, reduces rg,g^ (Figure S3). The contribution of pedigree relationships to long-term genetic gain in scenario ReLDASNP should therefore be relatively constant for medium to large Np. As the contribution of cosegregation to rg,g^ decreases with larger Np, ΔG of scenario ReLEASNP strongly declined. Conversely, ΔG of scenario UnLDASNP strongly increased with larger Np due to more information from LDA. Given that there is sufficient LDA present in the ancestral population (LR), both effects largely compensate for each other and hence, ΔG in scenario ReLDASNP appears to be insensitive to changes in Np beyond four parents for LR (Figure 3). When there is not sufficient LDA as applies to SR, increasing information due to LDA can no longer compensate for the loss in cosegregation information and therefore ΔG in ReLDASNP decreased for higher Np. Although we considered ΔG close to its steady state, it is important to note that the essential trends in ΔG are already apparent for as few as two selection cycles (Figure S2), which implies that our observations do not only apply to the situation of extreme long-term selection without retraining, but also to few selection cycles.

Influence of TS size and SNP density

We found that increasing NTS leads to higher persistency of rg,g^ in early selection cycles for scenarios with pedigree relationship between TS and RSC (PTS=PRSC, Figure 5). This is because, for a given Np, increasing NTS enhances the probability of obtaining TS individuals that share an exceptionally large portion of their genome with the RSC individuals due to Mendelian sampling and because of similarities between individuals due to LDA. Hence, for small NTS there is a higher reliance on information from pedigree relationships (Jannink et al. 2010; Schopp et al. 2017) that quickly erodes under directional selection. For large NTS, there is a higher weight on information from cosegregation and LDA, which in turn increases the persistency of rg,g^. This shift in emphasis also entails reduced inbreeding, especially in early selection cycles (results not shown), in agreement with the findings of Jannink (2010). Therefore, if a prediction equation is to be used for multiple cycles, NTS should be chosen large enough to not only guarantee high initial rg,g^, but also high persistency of rg,g^ and reduced inbreeding in order to improve genetic gain from GS. Increasing SNP density from 0.125 to 2.5 cM1, corresponding to ∼250 and 5000 SNPs in the case of maize, led to an increase in the persistency of rg,g^ (Figure 5), which is in concordance with previous studies (Solberg et al. 2009; Sonesson and Meuwissen 2009). Higher SNP density theoretically affects all three sources of information, but its influence should be strongest on LDA and cosegregation because they rely on physical proximity of SNPs and QTL. If the SNP density is extremely low (e.g., 0.125 cM1), it is unlikely that SNPs and QTL are tightly linked and hence, SNPs mainly capture pedigree relationships, whereas LDA and cosegregation play only subordinate roles. Therefore, high SNP density improves persistency of rg,g^over generations, because information from both LDA (Figure 5, Np=16) and cosegregation (Figure 5, Np=2) are less prone to decay, compared to pedigree relationships. The highest SNP density we investigated was 2.5 cM1, which is relatively low compared to what is nowadays available in many plant species. However, because of the strong influence of cosegregation in synthetics that are produced from a low to intermediate number of parents, we would expect that little can be gained by further increasing SNP density, especially if long-range LDA is present, as can be assumed for elite germplasm in practical applications. However, the situation can be quite different for large Np and if there is only short-range LDA in the ancestral population, which rapidly increases the need for higher SNP densities.

Influence of the number of recombination generations

We hypothesized that larger NR might lead to enhanced long-term ΔG by virtue of a stronger fragmentation of chromosomes in the synthetic. Actually, the average length of chromosomal segments of unique parental origin decreased from ∼66 cM for NR=1 to 30 cM (Np=2) and 20 cM (Np=16) for NR=5 (Figure S4B). However, as information from pedigree relationships strongly declined with increasing NR (Figure 6, scenario ReLDAPed), rg,g^ in C=0 generally decreased in scenario ReLDASNP. Conversely, the decline of information contributed by LDA with increasing NR was negligible (scenario UnLDASNP). Decreasing selection accuracy reduces ΔG, which can conceal the positive effect of higher genome fragmentation. Analysis of the latter factor alone is possible with selection regime TBV, where selection accuracy was always constant and equal to one, regardless of NR. Here, we found higher ΔG for NR=5 compared to NR=1 (Figure 7) because finer fragmentation promotes occurrence of genotypes with favorable allele combinations for selection. This is accompanied by a reduced coselection of QTL, such that more QTL stay polymorphic and therefore σA2 remains higher in advanced selection cycles. The positive effect of NR on ΔG under selection on TBV increased with increasing Np, presumably because larger Np results in even finer genome fragmentation (Figure S4B). For selection regime EBV, ΔG in C=30 was not higher for NR=5 than for NR=1, suggesting that positive and negative effects of recombination cancelled out each other. For ancestral population SR, ΔG was even slightly lower for NR=5, because compared to LR, stochastic dependency between QTL is relatively low from the beginning and hence, higher fragmentation has only a minor effect. A special situation existed for Np=2, which is explained in File S1.

It is noteworthy that in our simulations the initial σA2 (C=0) was unaffected by NR, although strong sample LD between QTL was broken up. In reality, ancestral populations (corresponding to source germplasms in breeding) generally underwent some sort of directional selection, which can theoretically cause a reduction in σA2 due to the Bulmer effect (Bulmer 1971; Long et al. 2011). This hidden part of σA2 attributable to negative LD between causal loci can be recovered by recombination, which might lead to an increase in ΔG for NR>1.

Implications for practical applications

At the start of any breeding program employing GS with the goal of improving quantitative traits, breeders have to make a number of crucial decisions, including the source germplasm, parents, and mating scheme used to develop the breeding population. Further decisions specific to GS concern the NTS and marker density. All of these factors influence the importance of the three information sources in GS and thereby have ramifications on the success of the breeding program.

The choice of the source germplasm crucially determines the improvement potential for the target trait (Fountain and Hallauer 1996), because it determines the genetic diversity and linkage disequilibrium (i.e., LDA), which are both of central importance for the success of GS. Our study demonstrates that information from LDA generally offers high persistency across selection cycles in synthetics, irrespective of Np. Hence, LDA is particularly important for ensuring sustained genetic progress during the breeding program. However, the contribution of LDA to genetic gain is itself highly dependent on Np. Whereas for large Np, LD in synthetics adequately represents LDA, small Np generates sample LD and, in turn, cosegregation that dominates LD in synthetics. Cosegregation has a similarly high persistency as LDA, but it can only contribute to genetic gain if TS and selection candidates are related by having parents in common. However, it must be taken into account that reducing Np also reduces the initially available genetic variance for breeding, thereby impairing ΔG. In essence, high persistency of rg,g^ and thereby prolonged genetic progress may be achieved irrespective of Np, but if Np is large, substantial LDA is required.

Pedigree relationships also contribute to predictive information for Np>2, and harnessing pedigree information has been recommended to achieve high rg,g^ in GS (e.g., Wolc et al. 2011a). Frequent retraining of the prediction equation, at best in every generation, would be required to optimally exploit pedigree relationships because information from them rapidly erodes over generations, especially under directional selection. In addition, selection using pedigree relationships increases the rate of inbreeding due to intraclass correlation of EBV for members of the same family and their coselection (Daetwyler et al. 2007), a result that is well known in animal breeding (Belonsky and Kennedy 1988) and was confirmed in our study for synthetics in plant breeding (Figure S5A). A high rate of inbreeding is undesirable in long-term selection, because genetic diversity is rapidly depleted and eventually ΔG is compromised. In GS, it was shown that molecular markers not only capture deviations of genomic relationships from pedigree relationships, but also the pedigree relationships themselves (Habier et al. 2007), i.e., the latent family structure in the case of synthetics. Therefore, the same concerns as for pedigree-based selection partially apply to GS, so that GS is also prone to selection of close relatives and inbreeding (Jannink 2010). If the breeding objective is long-term ΔG, as classically targeted by RS in genetically broad-based populations (Hallauer and Carena 2012), corresponding to large Np in our study, deliberate avoidance of using pedigree relationships might be desirable for maximizing long-term ΔG.

There are different possibilities to reduce the influence of pedigree relationships. Increasing both NTS and marker density leads to an improved capturing of Mendelian sampling and similarities between individuals due to LDA, which reduces the reliance on pedigree relationships and in turn reduces inbreeding. Another possibility could be modeling information from LDA, cosegregation (Calus et al. 2008; Legarra and Fernando 2009), and pedigree relationships in a joint linear mixed model in an attempt to isolate information from pedigree relationships. Alternatively, one could modify the mating scheme used for generating the synthetic. Additional generations of recombination successfully decreased strong variation in pedigree relationships between individuals, but only up to Np5 where a baseline level was reached (Figure S4C). Mating schemes as employed for establishing the Multi-parent Advanced Generation Intercrosses (MAGIC) largely avoid population substructure and pedigree relationships, while they complement the favorable properties of synthetics such as high genetic diversity and elevated minor allele frequencies with a fine-grained mosaic of the genome (compare Dell’Acqua et al. 2015; Holland 2015). Thus, they potentially represent ideal candidates for long-term recurrent GS, but this warrants further research.

Supplementary Material

Supplemental material is available online at www.g3journal.org/lookup/suppl/doi:10.1534/g3.116.036582/-/DC1.

Acknowledgments

This study was financially supported by the project “Climate Resilient Maize for ASIA (CRMA)” from the International Maize and Wheat Improvement Center, México and the Deutsche Gesellschaft für Internationale Zusammenarbeit, project no. 15.7860.8-001.00 (contract no. 81194991).

Footnotes

Communicating editor: J. B. Holland

Literature Cited

  1. Bastiaansen J. W. M., Coster A., Calus M. P. L., van Arendonk J. A. M., Bovenhuis H., 2012.  Long-term response to genomic selection: effects of estimation method and reference population structure for different genetic architectures. Genet. Sel. Evol. 44: 3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Belonsky G. M., Kennedy B. W., 1988.  Selection on individual phenotype and best linear unbiased predictor of breeding value in a closed swine herd. J. Anim. Sci. 66: 1124–1131. [DOI] [PubMed] [Google Scholar]
  3. Bernardo R., 2009.  Should maize doubled haploids be induced among F1 or F 2 plants? Theor. Appl. Genet. 119: 255–262. [DOI] [PubMed] [Google Scholar]
  4. Bernardo R., Yu J., 2007.  Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 47: 1082–1090. [Google Scholar]
  5. Beyene Y., Semagn K., Mugo S., Tarekegne A., Babu R., et al. , 2015.  Genetic gains in grain yield through genomic selection in eight bi-parental maize populations under drought stress. Crop Sci. 55: 154–163. [Google Scholar]
  6. Bulmer M. G., 1971.  The effect of selection on genetic variability. Am. Nat. 105: 201–211. [Google Scholar]
  7. Calus M. P. L., Meuwissen T. H. E., de Roos A. P. W., Veerkamp R. F., 2008.  Accuracy of genomic selection using different methods to define haplotypes. Genetics 178: 553–561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Coster, A., 2013 Pedigree: Pedigree Functions. Available at: https://rdrr.io/cran/pedigree. Accessed: Month day, year.
  9. Daetwyler H. D., Villanueva B., Bijma P., Woolliams J. A., 2007.  Inbreeding in genome-wide selection. J. Anim. Breed. Genet. 124: 369–376. [DOI] [PubMed] [Google Scholar]
  10. Daetwyler H. D., Villanueva B., Woolliams J. A., 2008.  Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS One 3: e3395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Dell’Acqua M., Gatti D. M., Pea G., Cattonaro F., Coppens F., et al. , 2015.  Genetic properties of the MAGIC maize population: a new platform for high definition QTL mapping in Zea mays. Genome Biol. 16: 167. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Endelman J. B., 2011.  Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome J. 4: 250. [Google Scholar]
  13. Falconer D. S., Mackay T. F. C., 1996.  Introduction to Quantitative Genetics. Benjamin Cummings, San Francisco. [Google Scholar]
  14. Fountain M. O., Hallauer A. R., 1996.  Genetic variation within maize breeding populations. Crop Sci. 36: 26–32. [Google Scholar]
  15. Goddard M., 2009.  Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136: 245–257. [DOI] [PubMed] [Google Scholar]
  16. Goddard M. E., Hayes B. J., Meuwissen T. H. E., 2011.  Using the genomic relationship matrix to predict the accuracy of genomic selection. J. Anim. Breed. Genet. 128: 409–421. [DOI] [PubMed] [Google Scholar]
  17. Gorjanc G., Jenko J., Hearne S. J., Hickey J. M., 2016.  Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genomics 17: 30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Habier D., Fernando R. L., Dekkers J. C. M., 2007.  The impact of genetic relationship information on genome-assisted breeding values. Genetics 177: 2389–2397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Habier D., Tetens J., Seefried F.-R., Lichtner P., Thaller G., 2010.  The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet. Sel. Evol. 42: 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Habier D., Fernando R. L., Garrick D. J., 2013.  Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics 194: 597–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Hallauer A. R., 1992.  Recurrent selection in maize. Plant Breed. Rev. 9: 115–179. [Google Scholar]
  22. Hallauer A. R., Carena M. J., 2012.  Recurrent selection methods to improve germplasm in maize. Maydica 57: 266–283. [Google Scholar]
  23. Hayes B. J., Visscher P. M., Goddard M. E., 2009.  Increased accuracy of artificial selection by using the realized relationship matrix. Genet. Res. 91: 47–60. [DOI] [PubMed] [Google Scholar]
  24. Heffner E. L., Lorenz A. J., Jannink J. L., Sorrells M. E., 2010.  Plant breeding with genomic selection: gain per unit time and cost. Crop Sci. 50: 1681–1690. [Google Scholar]
  25. Holland J. B., 2015.  MAGIC maize: a new resource for plant genetics. Genome Biol. 16: 163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jannink J.-L., 2010.  Dynamics of long-term genomic selection. Genet. Sel. Evol. 42: 35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Jannink J.-L., Lorenz A. J., Iwata H., 2010.  Genomic selection in plant breeding: from theory to practice. Brief. Funct. Genomics 9: 166–177. [DOI] [PubMed] [Google Scholar]
  28. Legarra A., Fernando R. L., 2009.  Linear models for joint association and linkage QTL mapping. Genet. Sel. Evol. 41: 43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Liu H., Meuwissen T., Sørensen A. C., Berg P., 2015.  Upweighting rare favourable alleles increases long-term genetic gain in genomic selection programs. Genet. Sel. Evol. 47: 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Long N., Gianola D., Rosa G. J. M., Weigel K. A., 2011.  Marker-assisted prediction of non-additive genetic values. Genetica 139: 843–854. [DOI] [PubMed] [Google Scholar]
  31. Massman J. M., Jung H. J. G., Bernardo R., 2013.  Genomewide selection vs. marker-assisted recurrent selection to improve grain yield and stover-quality traits for cellulosic ethanol in maize. Crop Sci. 53: 58–66. [Google Scholar]
  32. Meuwissen T. H. E., Hayes B. J., Goddard M. E., 2001.  Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: 1819–1829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Mikel M. A., 2006.  Availability and analysis of proprietary dent corn inbred lines with expired U.S. plant variety protection. Crop Sci. 46: 2555–2560. [Google Scholar]
  34. Mikel M. A., Dudley J. W., 2006.  Evolution of North American dent corn from public to proprietary germplasm. Crop Sci. 46: 1193–1205. [Google Scholar]
  35. Muir W. M., 2007.  Comparison of genomic and traditional BLUP-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. J. Anim. Breed. Genet. 124: 342–355. [DOI] [PubMed] [Google Scholar]
  36. Nielsen H. M., Sonesson A. K., Yazdi H., Meuwissen T. H. E., 2009.  Comparison of accuracy of genome-wide and BLUP breeding value estimates in sib based aquaculture breeding schemes. Aquaculture 289: 259–264. [Google Scholar]
  37. Quinton M., Smith C., Goddard M. E., 1992.  Comparison of selection methods at the same level of inbreeding. J. Anim. Sci. 70: 1060–1067. [DOI] [PubMed] [Google Scholar]
  38. R Core Team , 2015.  R: A Language and Environment for Statistical Computing R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  39. Sargolzaei M., Schenkel F. S., 2009.  QMSim: a large-scale genome simulator for livestock. Bioinformatics 25: 680–681. [DOI] [PubMed] [Google Scholar]
  40. Schnable P. S., Xu X., Civardi L., Xia Y., Hsia A.-P., et al. , 1996.  The role of meiotic recombination in generating novel genetic variability, pp. 103–110 in The Impact of Plant Molecular Genetics, edited by Sobral B. W. S. Birkhäuser, Boston, MA. [Google Scholar]
  41. Schopp P., Müller D., Technow F., Melchinger A. E., 2017.  Accuracy of genomic prediction in synthetic populations depending on the number of parents, relatedness and ancestral linkage disequilibrium. Genetics 205: 441–454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Solberg T. R., Sonesson A. K., J. A. Woolliams, J. Odegard, and Meuwissen T. H. E., 2009.  Persistence of accuracy of genome-wide breeding values over generations when including a polygenic effect. Genet. Sel. Evol. 41: 53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Sonesson A. K., Meuwissen T. H. E., 2009.  Testing strategies for genomic selection in aquaculture breeding programs. Genet. Sel. Evol. 41: 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Van Grevenhof E. M., Van Arendonk J. A., Bijma P., 2012.  Response to genomic selection: the Bulmer effect and the potential of genomic selection when the number of phenotypic records is limiting. Genet. Sel. Evol. 44: 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. VanRaden P. M., 2008.  Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414–4423. [DOI] [PubMed] [Google Scholar]
  46. Windhausen V. S., Atlin G. N., Hickey J. M., Crossa J., Jannink J.-L., et al. , 2012.  Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3 2: 1427–1436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Wolc A., Arango J., Settar P., Fulton J. E., O’Sullivan N. P., et al. , 2011a Persistence of accuracy of genomic estimated breeding values over generations in layer chickens. Genet. Sel. Evol. 43: 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wolc A., Stricker C., Arango J., Settar P., Fulton J. E., et al. , 2011b Breeding value prediction for production traits in layer chickens using pedigree or genomic relationships in a reduced animal model. Genet. Sel. Evol. 43: 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Wolc A., Arango J., Settar P., Fulton J. E., O’Sullivan N. P., et al. , 2016.  Mixture models detect large effect QTL better than GBLUP and result in more accurate and persistent predictions. J. Anim. Sci. Biotechnol. 7: 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Yabe S., Ohsawa R., Iwata H., 2013.  Potential of genomic selection for mass selection breeding in annual allogamous crops. Crop Sci. 53: 95–105. [Google Scholar]
  51. Yabe S., Yamasaki M., Ebana K., Hayashi T., Iwata H., 2016.  Island-model genomic selection for long-term genetic improvement of autogamous crops. PLoS One 11(4): e0153945. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Zhong S., Dekkers J. C. M., Fernando R. L., Jannink J.-L., 2009.  Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a Barley case study. Genetics 182: 355–364. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article.


Articles from G3: Genes|Genomes|Genetics are provided here courtesy of Oxford University Press

RESOURCES