Skip to main content
Genetics logoLink to Genetics
. 2021 Aug 4;219(3):iyab124. doi: 10.1093/genetics/iyab124

QTL mapping in outbred tetraploid (and diploid) diallel populations

Rodrigo R Amadeu 1, Patricio R Muñoz 1, Chaozhi Zheng 2, Jeffrey B Endelman 3,
Editor: D Nielsen
PMCID: PMC8570786  PMID: 34740237

Abstract

Over the last decade, multiparental populations have become a mainstay of genetics research in diploid species. Our goal was to extend this paradigm to autotetraploids by developing software for quantitative trait locus (QTL) mapping in connected F1 populations derived from a set of shared parents. For QTL discovery, phenotypes are regressed on the dosage of parental haplotypes to estimate additive effects. Statistical properties of the model were explored by simulating half-diallel diploid and tetraploid populations with different population sizes and numbers of parents. Across scenarios, the number of progeny per parental haplotype (pph) largely determined the statistical power for QTL detection and accuracy of the estimated haplotype effects. Multiallelic QTL with heritability 0.2 were detected with 90% probability at 25 pph and genome-wide significance level 0.05, and the additive haplotype effects were estimated with over 90% accuracy. Following QTL discovery, the software enables a comparison of models with multiple QTL and nonadditive effects. To illustrate, we analyzed potato tuber shape in a half-diallel population with three tetraploid parents. A well-known QTL on chromosome 10 was detected, for which the inclusion of digenic dominance lowered the Deviance Information Criterion (DIC) by 17 points compared to the additive model. The final model also contained a minor QTL on chromosome 1, but higher-order dominance and epistatic effects were excluded based on the DIC. In terms of practical impacts, the software is already being used to select offspring based on the effect and dosage of particular haplotypes in breeding programs.

Keywords: multiparental, polyploidy, dominance, haplotypes, MPP, Multiparental Populations

Introduction

For over three decades, the genetic mapping of quantitative trait loci (QTL) using DNA markers has been essential to basic and applied research. Early studies in plants focused on experimental diploid populations created from two parents (Lander and Botstein 1989; Knapp et al. 1990). For inbred parents, genetic marker alleles are easily coded to uniquely identify the two parental haplotypes at each locus (Young and Tanksley 1989). For two outbred parents of ploidy ϕ, there are 2ϕ parental haplotypes (or homologs), but in the case of biallelic SNPs, there are only two marker alleles. This difficulty was initially circumvented by using single-dose markers to create separate maternal and paternal maps in diploid species (Grattapaglia and Sederoff 1994) and separate maps for each homologous chromosome in polyploids (Wu et al. 1992). Advances in estimation theory (Ritter and Salamini 1996; Wu et al. 2002) and software (Van Ooijen and Voorrips 2001; Margarido et al. 2007) eventually made it possible to reconstruct F1 diploid offspring in terms of their parental haplotypes, and QTL mapping soon followed (Van Ooijen 2004). Inference of parental haplotypes in F1 polyploid offspring based on biallelic SNPs is even more challenging, but several software packages are now available for linkage and QTL analysis (Hackett et al. 2017; Bourke et al. 2018; Mollinari and Garcia 2019; da Silva Pereira et al. 2020).

Not long after QTL mapping in biparental populations became a reality, several groups explored the use of multiple, connected biparental populations via theory and simulation (Rebai and Goffinet 1993; Muranty 1996; Liu and Zeng 2000). We use the term diallel to represent a general class of mating designs in which groups of offspring are derived from no more than two founders (Figure 1). The transition from theory to practice for diallel mapping began in maize with 25 diverse inbred lines mated to a common parent (Yu et al. 2008; Buckler et al. 2009), and similar efforts have been published for other crops (Nice et al. 2017; Song et al. 2017). Examples of QTL mapping in connected F1 populations derived from outbred diploids have also been published, using the software FlexQTL (Rosyara et al. 2013; Bink et al. 2014). Compared with MAGIC (Multiparent Advanced Generation Inter-Cross) designs, in which offspring contain haplotypes from more than two founders (Zhang et al. 2014; Huang et al. 2015), diallel designs have the advantage of mimicking how plant breeding programs typically work. This is particularly important for breeding programs with limited resources to create specialized populations for discovery research.

Figure 1.

Figure 1

Examples of diallel populations with four parents (labeled a, b, c, d); each square represents a family of full-sibs. (A) full diallel; (B) half diallel with selfing; (C) half diallel without selfing; (D) circular; (E) linear; (F) factorial; (G) testcross or nested design; and (H) arbitrary partial diallel. Figure adapted from Verhoeven et al. (2006).

The objective of this research was to develop and apply software for QTL mapping in autotetraploid diallel populations (i.e., connected F1 populations), for which no other software is available. Our software, named diaQTL and available as a package for the R Computing Environment (R Core Team 2020), estimates additive effects by regression of phenotypes on the dosage of parental haplotypes. For a diallel with p tetraploid parents, there are 4p parental haplotypes (some of which may contain identical alleles at a given QTL). diaQTL computes parental haplotype dosage based on output from the companion software PolyOrigin, which estimates parental genotype probabilities using hidden Markov models to perform multipoint linkage analysis (Zheng et al. 2021). diaQTL can also analyze connected F1 populations from outbred diploid parents, using parental genotype probabilities from the software RABBIT (Zheng et al. 2015) or MAPpoly (Mollinari and Garcia 2019).

Our software is unique for its ability to model dominance effects at multiallelic QTL in tetraploid linkage mapping populations. The dominance deviation for one locus is the residual genetic effect with the additive model. In polyploids, a hierarchy of dominance effects can be defined by regression of the dominance deviation on combinations of parental haplotypes (Kempthorne 1957). Digenic dominance effects are the regression coefficients for a pair of haplotypes or diplotype; trigenic dominance effects are the regression coefficients for triplotypes, etc.

After exploring the statistical properties of QTL mapping in outbred diallel populations via simulation, we illustrate the use of diaQTL to analyze tuber shape in a tetraploid potato diallel with three parents.

Materials and methods

QTL model

A mixed model is used to relate phenotypes to the effects of parental haplotypes and potential covariates. For the tth measurement (e.g., plot) of clone i, the following equation specifies the additive model for m QTL in a diallel with p parents of ploidy ϕ (bold font designates a vector):

yit =μ + xitc +j=1mk=1ϕpWijkαjk+εit. (1)

The response variable yit denotes the phenotype; μ is the intercept; c is an optional column-vector of fixed effects with corresponding row-vector of covariates xit; the additive QTL effects αjk are random regression coefficients corresponding to the dosage of parental haplotype k at locus j, denoted Wijk; and εit is the residual.

The dosage Wijk is calculated from an input file containing the parental genotype probabilities for every offspring. For haplotypes originating in founders that are not the parents of clone i, Wijk is trivially zero. For the other haplotypes, Wijk is an expectation over the 100 unique parental genotype states in a biparental tetraploid F1 or 35 unique states in a uniparental (selfed) tetraploid S1 (Zheng et al. 2021). If wk(abcd) denotes the dosage of parental haplotype k in parental genotype abcd (each letter represents a parental haplotype, not necessarily unique), and Pij(abcd) is the genotype probability for clone i at locus j, then

Wijk=abcdwkabcdPijabcd. (2)

For diploids, the sum in Equation (2) is over the four parental genotype states for a biparental F1 or three states for a uniparental S1.

The additive model can be extended to include dominance effects. Denoting the digenic dominance effect for parental haplotypes k and k' at locus j as βjkk', which is a random regression coefficient corresponding to diplotype dosage Wijkk', the additive + digenic model is

yit =μ + xitc +j=1mk=1pϕWijkαjk+k=1pϕk'=kpϕWijkk'βjkk'+εit. (3)

Because the order of the parental haplotypes in diplotype kk' does not matter, the sum over k' in Equation (3) is restricted to values kk. For partial diallel designs, not all diplotypes are present, and the sum over k' must be restricted accordingly. Furthermore, digenic dominance may only be included for a subset of the m loci based on model selection procedures (see below). The dosage Wijkk' is computed analogously to Equation (2), based on the expectation over parental genotypes. For tetraploids, the formula is

Wijkk'=abcdwkk'abcdPijabcd. (4)

The symbol wkk'abcd denotes the dosage of diplotype kk' in genotype abcd, which equals the number of times kk' occurs in the 42=6 possible combinations of length 2 (sampling from abcd without replacement). The definition of higher-order dominance effects is analogous to the digenic case and therefore not explicitly shown.

Additive × additive epistatic effects can also be modeled. Denoting the epistatic effect for parental haplotypes k and k' at loci j and j', respectively, as ααjkj'k', the extension of Equation (1) is

yit =μ + xitc +j=1mk=1pϕWijkαjk+j'>jk=1pϕk'=kpϕWijkj'k'ααjkj'k'+εit. (5)

Typically, epistasis is not modeled for all pairs of loci (model selection procedures are used), so limits for the sum over j' are not explicit. The dosage Wijkj'k' of the two-locus diplotype is based on the expectation over the parental genotypes at both loci (designated abcd and a'b'c'd', respectively):

Wijkj'k'=abcda'b'c'd'wkabcdwk'a'b'c'd'PijabcdPij'a'b'c'd'. (6)

The meaning of the symbols in Equation (6) is the same as in Equation (2).

Polygenic effects

Polygenic effects can be added to the QTL model for improved understanding of trait genetic architecture and genomic prediction (latter is beyond the scope of this manuscript). Each hierarchy of QTL effects discussed above has an analogous polygenic effect with covariance based on identity-by-descent (IBD) relationships, which can be computed using diaQTL function IBDmat.

The additive relationship Aii' equals ploidy times kinship, which is the probability that a randomly chosen homolog in clone i is IBD at a particular locus to a randomly chosen homolog in clone i' (Gallais 2003). For locus j, the probability that both randomly chosen homologs are parental haplotype k is WijkϕWi'jkϕ, and thus the total probability of IBD (for any parental haplotype) is k=1ϕpWijkϕWi'jkϕ. Averaged over m loci, the additive relationship becomes

Aii'=ϕmj=1mk=1ϕpWijkϕWi'jkϕ=1ϕmj=1mk=1ϕpWijkWi'jk. (7)

Equation (7) is computed for each chromosome, and the value used for the polygenic effect is based on averaging all chromosomes that do not contain a QTL; in other words, by extending the leave-one-chromosome-out concept of Yang et al. (2014).

IBD relationships for dominance and epistatic effects can be defined analogously to additive relationship (Gallais 2003). The digenic dominance relationship Dii' equals the binomial coefficient ϕ2 times the probability that a randomly chosen pair of homologs in clone i are IBD at a particular locus to a randomly chosen pair in clone i'. For locus j, the probability that both randomly chosen homologs are parental diplotype kk' is Wijkk'ϕ2Wi'jkk'ϕ2, which leads to the following expression:

Dii'=ϕ21mj=1mk=1ϕpk'=kϕpWijkk'ϕ2Wi'jkk'ϕ2=ϕ2-11mj=1mk=1ϕpk'=kϕpWijkk'Wi'jkk'. (8)

Trigenic and quadrigenic dominance relationships are defined analogously. The additive × additive epistasis relationship Eii' is also based on IBD probabilities for diplotypes, but in this case, the two haplotypes are at different loci (j and j'), so the probability that both diplotypes are jkj'k' is Wijkj'k'ϕ2Wi'jkj'k'ϕ2 [using the same notation as Equation (5)]. The epistatic relationship is based on the average over all pairs of loci from different chromosomes:

Eii'=ϕ2j=1mj'>j1-1j=1mj'>jk=1ϕpk'=1ϕpWijkj'k'Wi'jkj'k'. (9)

Bayesian implementation

R package BGLR (Pérez and de Los Campos 2014) was used to implement the regression model due to its flexibility for modeling random effects and its ability to handle binary phenotypes with a generalized linear model [in which case the left-hand side of Equation (1) is the linear predictor, not the phenotype]. BGLR uses a Bayesian framework and Markov Chain Monte Carlo (MCMC) to generate samples from the posterior density. The additive haplotype effects for each QTL (Equation 1) are independent and identically distributed (i.i.d.) with a “BayesC” prior (Habier et al. 2011), which is a two-component mixture of normal and Dirac (i.e., point mass) distributions. This prior was chosen because it accommodates a greater range of complexity for the QTL allelic series than a single normal. The mixing probability and variance of the normal distribution are random (hyper)parameters with their own prior densities (beta and inverse chi-squared, respectively), chosen according to the default rules implemented in the BGLR software. The digenic dominance effects for each locus are i.i.d. with a BayesC prior (separate from the prior for the additive effects), and the same holds for higher-order dominance or epistatic effects. The prior density for the additive polygenic effect is multivariate normal with covariance matrix A×Vpoly (“RKHS” model in BGLR), and the variance component Vpoly has an inverse chi-squared prior. The residuals are i.i.d. normal with variance Vε, which has an inverse chi-squared prior.

If the QTL effects were modeled as fixed, Equation (1) would be overparameterized, and constraints would be needed to ensure estimability (Hackett et al. 2014). Although constraints are not theoretically needed for random effects, in practice, we observed large variation in the sum of the additive effects during the Markov chain, even as the differences between the additive effects remained fairly stable. Because ultimately it is these differences that are biologically meaningful, and to reduce the Bayesian credible interval (CI; described below) for the genetic effects, a constant was added to all additive effects (separately for each locus) at each iteration of the Markov chain to impose the constraint of zero sum, and the model intercept was adjusted accordingly. Each group of nonadditive effects was similarly constrained to have zero sum.

QTL genetic variances were calculated for each iteration of the Markov chain from the corresponding genetic effects. If uij=k=1ϕpWijkαjk denotes the additive value for clone i due to locus j, the additive variance for that locus is Va,j=variuij=u·j2-u·j2, and nonadditive variances were computed analogously. The total genetic variance for locus j, denoted VQ,j, is the sum of additive and nonadditive variances. The proportion of variance due to the additive effects for locus j is

hj2=Va,jVε+λVpoly+j'VQ,j'. (10)

The parameter λ in Equation (10) equals the mean diagonal of the additive relationship matrix, to account for inbreeding in selfed populations (Endelman and Jannink 2012). The proportion of variance due to nonadditive or polygenic effects is calculated analogously to Equation (10).

Two kinds of Bayesian CI are computed in diaQTL. The CI for model parameters is based on the quantiles of the Markov chain (after discarding the burn-in iterations). The CI for QTL location uses a profile likelihood based on the methodology in the R/qtl package (Broman and Sen 2009). Let fk=beLLkdk be the probability of QTL location being in marker-bin k, where LLk is the posterior mean of the log-likelihood, dk is the bin-width in centiMorgans (cM), and the normalization constant b is chosen such that 1=kfk. The cumulative distribution at bin k is Fk=i=1kfi. The lower bin for the CI with probability 1-α is one less than the largest k satisfying Fkα/2, and the upper bin is one more than the smallest k satisfying Fk1-α/2.

The burn-in and total number of iterations for MCMC were determined using the Raftery and Lewis (1992) diagnostic implemented in R package coda (Plummer et al. 2006). This diagnostic is based on estimating quantile q of a parameter within the interval (q-r, q + r) with probability s (= 0.95 in our analysis). diaQTL function set_params returns the value of the diagnostic for each of the genetic and residual variances, and the number of iterations was chosen based on the largest value. For QTL discovery with a one-dimensional scan, which relies on the posterior mean (see below), we used q =0.5, r =0.1. For estimating model parameters and the 90% CI at discovered QTL, we used q =0.05, r =0.025.

Model selection

In accordance with the statistical principle of parsimony, we used the Deviance Information Criterion (DIC) to guide the selection of genetic models (Lenarcic et al. 2012; Pérez and de Los Campos 2014). DIC uses the posterior mean deviance (-2 × log-likelihood) to measure model fit and then adds a penalty for model complexity, which equals the difference between the posterior mean deviance and the deviance evaluated at the posterior mean (Spiegelhalter et al. 2002). The diaQTL package utilizes DIC for both QTL discovery (with function scan1) and postdiscovery exploration of multiple QTL and nonadditive models (with function fitQTL).

For QTL discovery, we used the conventional Neyman–Pearson hypothesis testing framework, with the additive QTL model (Equation 1) as the alternative hypothesis. The null hypothesis follows Equation (1), but instead of additive QTL effects, there is one effect for each parent to model general combining ability (GCA). The GCA effects have i.i.d. normal priors, with an inverse chi-squared prior for the variance (“BRR” model in BGLR). Although GCA effects are not explicit in Equation (1), they are implicitly present because the average of the additive haplotype effects for each parent equals its GCA.

The test statistic (i.e., “score”) for each marker-bin during the genome scan is  -ΔDIC, which equals the DIC of the null hypothesis minus the DIC of the alternative hypothesis. A typical convention for accepting a more complex model is that the DIC should decrease by at least 5, preferably 10 (Lunn et al. 2012), but this is not adequate to control the genome-wide Type I error rate during QTL discovery. To determine a proper threshold, one can use diaQTL function scan1_permute to run a stratified permutation test (Churchill and Doerge 1994), randomly permuting the phenotypes within each F1 population. The largest  -ΔDIC for each permutation is recorded, and the 1-α quantile of this distribution is used for QTL discovery with significance level α.

Alternatively, due to the generic nature of linkage mapping populations for a given mating design, number of parents, ploidy, and genome size, simulation can be used to determine the appropriate threshold. Half-diallel mating designs (without selfing) were simulated using the software PedigreeSim V2.0 (Voorrips and Maliepaard 2012) and auxiliary R package PedigreeSimR (https://www.github.com/rramadeu/PedigreeSimR). The simulated genomes contained 1–12 linkage groups, each with a length of 100 cM, loci evenly spaced at 0.1 cM, and recombination based on Haldane's map function. For tetraploids, only bivalents were allowed with no preferential pairing of homologous chromosomes. The number of parents was varied from 2 to 10. A genome-wide scan with scan1 was run for 1000 simulations, using standard normal deviates as phenotypes, and the largest  -ΔDIC for each simulation was recorded. For each ploidy (2x and 4x) and significance level (α=0.01, 0.05, 0.1, 0.2), a bivariate monotone regression spline was fit for genome size and number of parents using R/scam (Pya and Wood 2015). Predicted values from the spline are returned by diaQTL function DIC_thresh.

Power simulation

A factorial numerical experiment with 3 × 3 × 3 × 2 = 54 treatment scenarios, and 1000 simulations per scenario, was used to estimate statistical power and accuracy as a function of the total population size (N =200, 400, 800), number of parents (3, 4, and 6) with a half-diallel mating design, heritability (h2 = 0.1, 0.2, 0.4), and ploidy (2 and 4), assuming a genome size of 12 Morgans (simulated as described above). The  -ΔDIC threshold was chosen to maintain significance level α=0.05. Simulations were performed using a high-performance computing facility at the University of Florida (HiPerGator 2.0), allocating 2 GB of RAM for each thread.

For each simulation, a single additive QTL was randomly assigned to one locus, and all other loci were used as markers for mapping. Unreplicated phenotypes were simulated by yi=k=1ϕpWikαk+εi [same symbols as Equation (1)], using i.i.d. standard normal deviates for the additive haplotype effects and i.i.d. normal residuals with variance chosen to achieve the target heritability. If the realized heritability, computed from the simulated QTL effects and residuals (Equation 10), deviated from the prescribed value by more than 0.01 (due to finite sample size), the simulation was discarded before QTL mapping. Perfect knowledge about the haplotype dosage Wik was assumed, which is asymptotically achievable with PolyOrigin as marker density increases (Zheng et al. 2021). Genotyping error would reduce power and accuracy (Zhang et al. 2014; Bourke et al. 2019).

Statistical power was the proportion of simulations in which both flanking markers of the simulated QTL were declared significant. The accuracy of the inferred QTL position was the distance between the simulated position and most significant marker. Accuracy for the posterior mean estimates of the additive effects was evaluated based on the Pearson correlation with simulated values. Monotone splines for power and accuracy were fit using R/scam (Pya and Wood 2015).

To ensure the results were robust to polygenic effects, we simulated traits with an additive QTL and polygenic effect for a half-diallel design with four parents and total population size N =200. The multivariate normal deviate was simulated using mvrnorm from R/MASS (Venables and Ripley 2002), with covariance equal to the additive relationship matrix (Equation 7) times a polygenic variance component. Increasing the proportion of variance due to the polygenic effect up to 0.3, while keeping the QTL h2 at 0.2, had negligible effect on statistical power (Supplementary Figure S1).

Potato data

The potato dataset is a half-diallel population (without selfing) for three parents from the University of Wisconsin (UW) breeding program: Villetta Rose, W6511-1R, and W9914-1R. The population was genotyped using version 3 of the potato SNP array, and allele dosage was called using R package fitPoly (Voorrips et al. 2011; Zych et al. 2019). There were 5334 markers distributed across 12 chromosome groups, and their physical positions were based on the potato DMv4.03 reference genome (Potato Genome Sequencing Consortium 2011; Sharma et al. 2013). Parental genotype probabilities and a genetic map were calculated using the software PolyOrigin (Zheng et al. 2021), which identified 19 of the 434 progeny as outliers, and phenotype data were unavailable for 2 more. The remaining 413 clones were distributed as follows: Villetta Rose × W6511-1R (154 individuals), Villetta Rose × W9914-1R (113 individuals), and W6511-1R × W9914-1R (146 individuals). The genetic map produced by PolyOrigin spanned 12.1 Morgans and contained 2781 marker-bins.

The half-diallel population was evaluated as part of a larger field trial in 2018 at the UW Hancock Agricultural Research Station. The trial used an augmented design with four incomplete blocks and seven repeated checks per block. Each plot was a single row sown with 15 seeds (tuber pieces) at 30 cm in-row spacing and 90 cm between rows. The trial was harvested mechanically, and tubers were passed through an optical sizer after washing to measure tuber length and width. Tuber shape was calculated as the average length/width (L/W) ratio of all tubers weighing 170–285 g. To improve the normality of the residuals, the trait was transformed to log[(L/W)-1]. Initial statistical analysis using ASReml-R v4 (Butler et al. 2018) was based on the linear model yij=μ+blockj+clonei+εij. Broad-sense heritability on a plot basis was estimated at 0.91 by treating clonei and blockj as random effects (i.i.d. normal). BLUEs for each clone were estimated by treating clonei as fixed and then used as the response variable for analysis with diaQTL.

To give an idea about computational requirements, a variable of size 430 MB was created by the diaQTL function read_data from the potato input CSV files. Based on the output from set_params, 500 iterations were used with scan1 for QTL discovery, which required 3 min using 2 cores of a 3.1 GHz Intel Core i5 processor. Several functions in the diaQTL package (read_data, scan1, IBDmat) can utilize multiple cores for parallel execution.

Results

Statistical power and accuracy

Simulated half-diallel populations were generated to study the influence of ploidy, number of parents, and genome size on the threshold needed to control the genome-wide Type I error rate at α= 0.05. The test statistic is –ΔDIC, which equals the DIC for the null hypothesis (GCA but no QTL effects) relative to the alternative hypothesis of an additive QTL. The threshold was higher for tetraploids compared to diploids and increased approximately linearly with the number of parents (Figure 2). There was also an approximately linear relationship between the threshold and logarithm of genome size (Supplementary Figure S2). As expected from previous research in diploids (Lander and Botstein 1989), the threshold was not influenced by population size. QTL discovery can also be conducted using markers as covariates or with digenic dominance, but higher –ΔDIC thresholds are needed to maintain the same significance level (Supplementary Figure S3).

Figure 2.

Figure 2

Threshold to control the genome-wide Type 1 error rate at α =  0.05, for a half-diallel design without selfing. The statistic –ΔDIC is the Deviance Information Criterion for the null hypothesis (no QTL) relative to the alternative hypothesis of an additive QTL. Each point is based on 1000 simulations, and the best-fit linear regression is shown as a solid line.

The influence of population size, number of parents, ploidy, and QTL heritability on statistical power was investigated for a genome of 12 Morgans (the size of the potato genome). Power increased with population size and QTL heritability but decreased with the number of parents and ploidy (Supplementary Figure S4). For a given h2, the interplay between these factors could be largely summarized by the number of progeny per parental haplotype, abbreviated pph, but the diploid (tetraploid) results were consistently below (above) the regression spline (Figure 3 and Supplementary Figure S5). At h2 = 0.1 and α =  0.05, power reached 0.5 and 0.9 with 30 and 70 pph, respectively, while only 10 and 25 pph (respectively) were needed for h2 = 0.2. At h2 = 0.4, the power was 1 for all scenarios (Supplementary Figure S4). As power increased, so did the accuracy of the inferred QTL position, as measured by the distance between the most significant marker and QTL (Supplementary Figure S6). The positional accuracy was 6 cM when the power was 0.5 and decreased to 3 cM when power was 0.9.

Figure 3.

Figure 3

Statistical power to detect a multiallelic QTL at significance level α = 0.05, as a function of the number of progeny per parental haplotype, number of parents, ploidy, and QTL heritability (h2). Each point is the average of 1000 simulations with an additive model. The dashed line is a monotone increasing, concave spline for h2 = 0.1, and the solid line is the spline for h2 = 0.2.

The accuracy of the predicted haplotype effects, as measured by the correlation with simulated values, was also largely explained by the pph metric (Figure 4). For h2 = 0.1 and α =  0.05, 40 pph was sufficient to achieve an accuracy of 0.9, while for h2 = 0.2, only 20 pph was needed. Across all scenarios, the heritability of the QTL could be estimated with essentially no error (Supplementary Figure S4), even when the accuracy of the haplotype effects was only 0.7.

Figure 4.

Figure 4

Accuracy of the estimated haplotype effects, as measured by Pearson's correlation with the simulated values. Each point is the average of 1000 simulations with an additive model. The dashed line is a monotone increasing, concave spline for h2 = 0.1, and the solid line is the spline for h2 = 0.2.

Potato diallel

To demonstrate diaQTL with a real dataset, we analyzed potato tuber shape, measured by the L/W ratio, in a half-diallel population with three parents. The ideal range of values for tuber shape is determined by the end-product (e.g., long tubers for French fries, round tubers for potato chips) and to some extent cultural preferences. The three parents were developed for the red-skin fresh market in the United States, for which a round to slightly oblong shape is expected. The distribution of L/W values for the population ranged from 1.0 to 2.0 (Supplementary Figure S7), while the commercial checks used in the field experiment had values between 1.1 and 1.2.

A genome scan with the additive model identified a large peak on chromosome 10 at 63 cM (Figure 5A), designated 10@63. This region coincides with the location of the classical Ro (round) QTL in potato (Van Eck et al. 1994), and the causal gene has been identified as StOVP20, an OVATE Family Protein with orthologs that affect fruit shape in tomato, melon, and cucumber (Wu et al. 2018). StOVP20 is not present in the DM reference genome, but based on synteny analysis (Wu et al. 2018), it is flanked by the DM gene models (PGSC0003DMG400006678, PGSC0003DMG400006679). This corresponds to the DM genome interval (48,982,521—49,022,687), which lies within the 95% CI for the 10@63 QTL (48,203,284–50,782,097).

Figure 5.

Figure 5

R/diaQTL results for potato tuber shape using a half-diallel population with three parents: VillettaRose, W6511-1R, and W9914-1R. (A) Genome scan with scan1. The dashed horizontal lines correspond to α = 0.05 (gold) and α = 0.1 (red). The most significant marker was solcap_snp_c2_25522 on potato chromosome 10. (B) Additive effect estimates for the 12 parental haplotypes using fitQTL. Error bars are the 90% CI.

The next highest peak in the genome scan was on chromosome 1 at 133 cM. The  -ΔDIC score was at the threshold for α =  0.05 but clearly above the threshold for α =  0.1 (Figure 5A). The model with both QTL reduced the DIC by 23 points compared to the model with only 10@63, so it was accepted over the single QTL model (Table 1).

Table 1.

Model comparison for potato tuber shape based on ΔDIC, which is the DIC relative to a null model with parental GCA effects

Model Effects Δ DIC
1 10@63(A) −160
2 10@63(A) + 1@133(A) −183
3 10@63(A) + 1@133(A) + 10@63 × 1@133 (AA) −189
4 10@63(A + D) + 1@133(A) −200
5 10@63(A + D) + 1@133(A) + 10@63 × 1@133 (AA) −196
6 10@63(A + D+T) + 1@133(A) −197
7 10@63(A + D) + 1@133(A + D) −196

QTL are designated as chromosome@cM, with A = additive, D = digenic, T = trigenic, and AA = additive × additive epistasis. Bold value indicates model with the lowest DIC.

More complex models involving dominance and/or additive × additive epistatic effects were evaluated next (Table 1). Epistasis lowered the DIC by 6 points compared to the two-QTL additive model, whereas digenic dominance at 10@63 lowered the DIC by 17 points. Including both of these nonadditive effects raised the DIC, as did higher-order dominance at 10@63 and dominance at 1@133 (Table 1). We therefore selected the model with additive and digenic effects at 10@63 and additive effects at 1@133, which accounted for 26%, 8%, and 5% of the total variance, respectively (Table 2). To quantify the importance of smaller undetected QTL, an additive polygenic effect was included, which lowered the DIC by 78 points and accounted for 29% of the total variance (Table 2).

Table 2.

Proportion of variance for potato tuber shape based on Model 4 from Table 1, plus a polygenic effect

Effect Proportion of Variance
10@63 additive 0.26
10@63 digenic dominance 0.08
1@133 additive 0.05
Polygenic 0.29
Residual 0.31

QTL are designated as chromosome@cM.

Of the 12 parental haplotypes at 10@63, the largest additive effect (in magnitude) was estimated for haplotype 2 of parent W6511-1R (Figure 5B), and the negative sign means it reduced the L/W ratio. Haplotypes 2 and 4 of parent Villetta Rose also reduced the L/W ratio. For parent W9914-1R, all four haplotypes had similar additive effects but showed some differentiation with respect to nonadditive effects (Supplementary Figures S8 and S9).

Discussion

Although this is the first publication on statistical power and accuracy for QTL mapping in tetraploid diallel populations, some comparisons with previous studies on diploid diallel or biparental tetraploid populations are possible. Among previous simulation studies, Muranty (1996) is unique for explicitly showing the loss of statistical power as the number of outbred diploid parents increased at a fixed total population size. We observed the same trend (Figure 3), but the exact values for power are not comparable because of differences in the significance level (higher values produce higher power). Muranty (1996) also reported that the specific form of the mating design (e.g., cyclic, factorial, half-diallel) had little impact on power, which was confirmed in our simulation (data not shown). Bourke et al. (2019) reported statistical power in a biparental tetraploid population of size 200, which corresponds to 25 pph, was 0.8 for a biallelic, additive QTL with h2 = 0.1 and high genotypic information content. This is higher than the value of 0.4 for the regression spline in Figure 3, which may be the result of the simpler genetic architecture (biallelic vs multiallelic in our simulation).

Compared to existing software, R/diaQTL has unique features for modeling dominance and epistasis with multiallelic QTL in tetraploids. The inner product of the genetic dosages used for QTL mapping of nonadditive effects, when averaged across many loci, generates nonadditive relationships (see Materials andMethods). Due to Mendelian segregation, these realized relationships exhibit a distribution around the expected value based on pedigree (Supplementary Figure S10; Amadeu et al. 2020). Nonadditive relationship matrices probably have limited utility for QTL mapping, but we envision potential applications with genomic selection. Breeding values include 1/2 of the additive × additive epistasis and 1/3 of the digenic dominance in tetraploids (Gallais 2003), although this result is only exact when genotype frequencies equal the product of haplotype frequencies (Kempthorne 1957), which requires a balanced, complete diallel with selfs.

Our statistical model allows each parental haplotype to have a different effect, but we expect some haplotypes will carry the same QTL allele. Although the use of random effects for the regression coefficients helps offset the statistical complexity of the haplotype model, from a genetic point of view, we are interested in inferring identity-by-state relationships. At present, a user of our software is limited to hypothesizing that haplotypes with similar effect carry the same allele. One direction for future research would be to incorporate methods to formally test such hypotheses (Jannink and Wu 2003; Crouse et al. 2020).

Large plant breeding programs often maintain distinct research enterprises devoted to genetic discovery vs variety selection. Genetic discovery projects typically characterize novel populations, such as a panel of distantly related germplasm or larger-than-normal, unselected biparental populations, but this additional effort is not always feasible for small programs. There is also the potential disconnect between the development of a genetic marker in the discovery population and its successful deployment in the breeding populations; such markers often fail because they are not haplotype-specific. With a diallel QTL analysis of the breeding population, this additional step would be unnecessary. The diaQTL function haplo_plot allows users to visualize which parental haplotypes are present at a particular locus (Supplementary Figure S11), and it is straightforward to select progeny based on the dosage of particular haplotypes using diaQTL function haplo_get (see package tutorial).

The UW potato breeding program provides an example to discuss the feasibility of using breeding populations for discovery research. A typical mating scheme (for each market type) involves 10 females and 4 males (male fertility is scarce), from which 40 F1 populations with 500 progeny each are created and evaluated as single plants in the field. Using visual selection for plant maturity and tuber type, 800 individuals (= 4%) advance to the first clonal stage, for genotyping and further phenotyping. Under uniform selection, this population size corresponds to 20 pph for the female parents (more for the male parents), which translates into 80% power at h2 = 0.2 based on our simulation results.

Conclusion

The R/diaQTL software is a user-friendly tool for QTL mapping and haplotype analysis in connected F1 populations derived from outbred diploid or tetraploid parents. The number of progeny (or gametes, if selfed populations are used) per parental haplotype is a critical design parameter to consider when developing populations. By enabling selection on the effect and dosage of haplotypes without having to develop haplotype-specific markers, diaQTL can improve the efficiency of breeding.

Data availability

Version 1.00 of diaQTL and version 0.2 of PedigreeSimR, which were current at the time of manuscript publication, are available as Supplementary files through figshare under the GPL-3 license. Newer versions of the packages may be available at https://github.com/jendelman/diaQTL or https://github.com/rramadeu/PedigreeSimR. The potato dataset is distributed with the diaQTL package and used in the tutorial vignette. Supplemental material is available at figshare: https://doi.org/10.25386/genetics.14736564.

Acknowledgments

The authors thank Maria Caraza-Harter for testing the diaQTL software. R.R.A. and J.B.E. developed the software, analyzed the data, and drafted the manuscript. All authors contributed to the methodology and edited the manuscript.

Funding

Financial support provided by the USDA National Institute of Food and Agriculture (NIFA) Award No. 2019-67013-29166.

Conflicts of interest

The authors declare that there is no conflict of interest.

Literature cited

  1. Amadeu RR, Lara LAC, Munoz P, Garcia AAF.. 2020. Estimation of molecular pairwise relatedness in autopolyploid crops. G3 (Bethesda). 10:4579–4589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Bink MCAM, Jansen J, Madduri M, Voorrips RE, Durel CE, et al. 2014. Bayesian QTL analyses using pedigreed families of an outcrossing species, with application to fruit firmness in apple. Theor Appl Genet. 127:1073–1090. [DOI] [PubMed] [Google Scholar]
  3. Bourke PM, Hackett CA, Voorrips RE, Visser RGF, Maliepaard C.. 2019. Quantifying the power and precision of QTL analysis in autopolyploids under bivalent and multivalent genetic models. G3 (Bethesda). 9:2107–2122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bourke PM, van Geest G, Voorrips RE, Jansen J, Kranenburg T, et al. 2018. polymapR: linkage analysis and genetic map construction from F1 populations of outcrossing polyploids. Bioinformatics. 34:3496–3502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Broman KW, Sen S.. 2009. A Guide to QTL Mapping with R/Qtl. Dordrecht, Holland: Springer. [Google Scholar]
  6. Buckler ES, Holland JB, Bradbury PJ, Acharya CB, Brown PJ, et al. 2009. The genetic architecture of maize flowering time. Science. 325:714–718. [DOI] [PubMed] [Google Scholar]
  7. Butler DB, Cullis A, Gilmour Gogel B, Thompson R.. 2018. ASReml-R Reference Manual Version 4. Hemel Hempstead: VSN International Ltd. [Google Scholar]
  8. Churchill GA, Doerge RW.. 1994. Empirical threshold values for quantitative trait mapping. Genetics. 138:963–971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Crouse WL, Kelada SNP, Valdar W.. 2020. Inferring the allelic series at QTL in multiparental populations. Genetics. 216:957–983. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. da Silva Pereira G, Gemenet DC, Mollinari M, Olukolu BA, Wood JC, et al. 2020. Multiple QTL mapping in autopolyploids: a random-effect model approach with application in a hexaploid sweetpotato fullsib population. Genetics. 215:579–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Endelman JB, Jannink J-L.. 2012. Shrinkage estimation of the realized relationship matrix. G3 (Bethesda). 2:1405–1413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gallais A. 2003. Quantitative Genetics and Breeding Methods in Autopolyploids Plants. Paris: INRA. [Google Scholar]
  13. Grattapaglia D, Sederoff R.. 1994. Genetic linkage maps of Eucalyptus grandis and Eucalyptus urophylla using a pseudo-testcross: mapping strategy and RAPD markers. Genetics. 137:1121–1137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Habier D, Fernando RL, Kizilkaya K, Garrick DJ.. 2011. Extension of the Bayesian alphabet for genomic selection. BMC Bioinformatics. 12:186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hackett CA, Boskamp B, Vogogias A, Preedy KF, Milne I.. 2017. TetraploidSNPMap: software for linkage analysis and QTL mapping in autotetraploid populations using SNP dosage data. J Hered. 108:438–442. [Google Scholar]
  16. Hackett CA, Bradshaw JE, Bryan GJ.. 2014. QTL mapping in autotetraploids using SNP dosage information. Theor Appl Genet. 127:1885–1904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Huang BE, Verbyla KL, Verbyla AP, Raghavan C, Singh VK, et al. 2015. MAGIC populations in crops: current status and future prospects. Theor Appl Genet. 128:999–1017. [DOI] [PubMed] [Google Scholar]
  18. Jannink J-L, Wu X-L.. 2003. Estimating allelic number and identity in state of QTLs in interconnected families. Genet Res. 81:133–144. [DOI] [PubMed] [Google Scholar]
  19. Kempthorne O. 1957. An Introduction to Genetic Statistics. New York, NY: John Wiley & Sons. [Google Scholar]
  20. Knapp S, Bridges W, Birkes D.. 1990. Mapping quantitative trait loci using molecular marker linkage maps. Theor Appl Genet. 79:583–592. [DOI] [PubMed] [Google Scholar]
  21. Lander ES, Botstein D.. 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 121:185–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lenarcic AB, Svenson KL, Churchill GA, Valdar W.. 2012. A general Bayesian approach to analyzing diallel crosses of inbred strains. Genetics. 190:413–435. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Liu Y, Zeng Z-B.. 2000. A general mixture model approach for mapping quantitative trait loci from diverse cross designs involving multiple inbred lines. Genet Res. 75:345–355. [DOI] [PubMed] [Google Scholar]
  24. Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D.. 2012. The BUGS Book: A Practical Introduction to Bayesian Analysis. Boca Raton, FL: CRC Press. [Google Scholar]
  25. Margarido GR, Souza AP, Garcia AA.. 2007. OneMap: software for genetic mapping in outcrossing species. Hereditas. 144:78–79. [DOI] [PubMed] [Google Scholar]
  26. Mollinari M, Garcia AAF.. 2019. Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models. G3 (Bethesda). 9:3297–3314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Muranty H. 1996. Power of tests for quantitative trait loci detection using fullsib families in different schemes. Heredity. 76:156–165. [Google Scholar]
  28. Nice LM, Steffenson BJ, Blake TK, Horsley RD, Smith KP, et al. 2017. Mapping agronomic traits in a wild barley advanced backcross-nested association mapping population. Crop Sci. 57:1199–1210. [Google Scholar]
  29. Pérez P, de Los Campos G.. 2014. Genome-wide regression and prediction with the BGLR statistical package. Genetics. 198:483–495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Potato Genome Sequencing Consortium 2011. Genome sequence and analysis of the tuber crop potato. Nature. 475:189–195. [DOI] [PubMed] [Google Scholar]
  31. Plummer M, Best N, Cowles K, Vines K.. 2006. CODA: convergence diagnosis and output analysis for MCMC. R News. 6:7–11. [Google Scholar]
  32. Pya N, Wood SN.. 2015. Shape constrained additive models. Stat Comput. 25:543–559. [Google Scholar]
  33. R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
  34. Raftery AE, Lewis SM.. 1992. One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Stat Sci. 7:493–497. [Google Scholar]
  35. Rebai A, Goffinet B.. 1993. Power of tests for QTL detection using replicated progenies derived from a diallel cross. Theor Appl Genet. 86:1014–1022. [DOI] [PubMed] [Google Scholar]
  36. Ritter E, Salamini F.. 1996. The calculation of recombination frequencies in crosses of allogamous plant species with applications to linkage mapping. Genet Res. 67:55–65. [Google Scholar]
  37. Rosyara UR, Bink MC, van de Weg E, Zhang G, Wang D, et al. 2013. Fruit size QTL identification and the prediction of parental QTL genotypes and breeding values in multiple pedigreed populations of sweet cherry. Mol Breeding. 32:875–887. [Google Scholar]
  38. Sharma SK, Bolser D, de Boer J, Sønderkær M, Amoros W, et al. 2013. Construction of reference chromosome-scale pseudomolecules for potato: integrating the potato genome with genetic and physical maps. G3 (Bethesda). 3:2031–2047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Song Q, Yan L, Quigley C, Jordan BD, Fickus E, et al. 2017. Genetic characterization of the soybean nested association mapping population. Plant Genome. 10:1–14. [DOI] [PubMed] [Google Scholar]
  40. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A.. 2002. Bayesian measures of model complexity and fit. J Royal Statistical Soc B. 64:583–639. [Google Scholar]
  41. Van Eck HJ, Jacobs JM, Stam P, Ton J, Stiekema WJ, et al. 1994. Multiple alleles for tuber shape in diploid potato detected by qualitative and quantitative genetic analysis using RFLPs. Genetics. 137:303–309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Van Ooijen J. 2004. MapQTL® 5, Software for the Mapping of Quantitative Trait Loci in Experimental Populations of Diploid Species. Wageningen, Netherlands: Kyazma BV.
  43. Van Ooijen J, Voorrips R.. 2001. Joinmap® 3, Software for the Calculation of Genetic Linkage Maps. Wageningen, Netherlands: Kyazma BV.
  44. Venables WN, Ripley BD.. 2002. Modern Applied Statistics with S. 4th ed. New York, NY: Springer. [Google Scholar]
  45. Verhoeven KJ, Jannink JL, McIntyre LM.. 2006. Using mating designs to uncover QTL and the genetic architecture of complex traits. Heredity (Edinb). 96:139–149. [DOI] [PubMed] [Google Scholar]
  46. Voorrips RE, Gort G, Vosman B.. 2011. Genotype calling in tetraploid species from bi-allelic marker data using mixture models. BMC Bioinformatics. 12:172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Voorrips RE, Maliepaard CA.. 2012. The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics. 13:248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wu K, Burnquist W, Sorrells M, Tew T, Moore P, et al. 1992. The detection and estimation of linkage in polyploids using single-dose restriction fragments. Theor Appl Genet. 83:294–300. [DOI] [PubMed] [Google Scholar]
  49. Wu R, Ma C-X, Painter I, Zeng Z-B.. 2002. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theor Popul Biol. 61:349–363. [DOI] [PubMed] [Google Scholar]
  50. Wu S, Zhang B, Keyhaninejad N, Rodríguez G, Kim H, et al. 2018. A common genetic mechanism underlies morphological diversity in fruits and other plant organs. Nat Commun. 9:4734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Yang J, Zaitlen NA, Goddard ME, Visscher PM, Price AL.. 2014. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 46:100–106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Young N, Tanksley S.. 1989. Restriction fragment length polymorphism maps and the concept of graphical genotypes. Theor Appl Genet. 77:95–101. [DOI] [PubMed] [Google Scholar]
  53. Yu J, Holland JB, McMullen JB, Buckler ES.. 2008. Genetic design and statistical power of nested association mapping in maize. Genetics. 178:539–551. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhang Z, Wang W, Valdar W.. 2014. Bayesian modeling of haplotype effects in multiparent populations. Genetics. 198:139–156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Zheng C, Amadeu R, Munoz P, Endelman J.. 2021. Haplotype reconstruction in connected tetraploid F1 populations. Genetics. doi:10.1093/genetics/iyab106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Zheng C, Boer MP, van Eeuwijk FA.. 2015. Reconstruction of genome ancestry blocks in multiparental populations. Genetics. 200:1073–1087. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Zych K, Gort G, Maliepaard CA, Jansen RC, Voorrips RE.. 2019. FitTetra 2.0: improved genotype calling for tetraploids with multiple population and parental data support. BMC Bioinformatics. 20:148. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Version 1.00 of diaQTL and version 0.2 of PedigreeSimR, which were current at the time of manuscript publication, are available as Supplementary files through figshare under the GPL-3 license. Newer versions of the packages may be available at https://github.com/jendelman/diaQTL or https://github.com/rramadeu/PedigreeSimR. The potato dataset is distributed with the diaQTL package and used in the tutorial vignette. Supplemental material is available at figshare: https://doi.org/10.25386/genetics.14736564.


Articles from Genetics are provided here courtesy of Oxford University Press

RESOURCES